I two 7204VXR routers (IOS 12.2-46f SERVICE-PROVIDER feature set) each connected to a different service provider with bgp running over them in a multihomed inbound config.
HRSP is implemented on the internal interfaces while tracking the external interfaces for changing the priority.
5 days back, the primary router suddenly restarted. After it restarted, all inbound traffic to our webservers failed and kept failing. As I troubleshooted the router over a couple of days (changing all of its hardware modules) (it has a POS STM external interface), I finally had to shut its internal interface to allow all traffic through the second router.
I noticed that while it was bgp peering with its corresponding ISP neighbor => it utilized about 81% of memory (it has a 256 RAM) with about ~ 250000 bgp entries. Cisco Output Interpreter specifies this as a dangerous threshold. Late night tests revealed that somehow if if I put the primary router back online and disabled the secondary router => traffic flows inbound successfully. But when both routers are live => the memory utilization on the primary jumps to 85% and traffic sometimes flows and sometimes there are extreme cases of losses. (I had to actually downgrade the IOS on the primary to that SERVICE PROVIDER set from the initial IP PLUS IPSEC 3DES to free fractions of memory allocations to enable traffic to flow even if with some loss).
I have been told that I need a memory upgrade to at least 512 RAM.
My specific question is: Does 80% memory utilization primarily from BGP Router processes and entries is overloading the router that it cannot function properly? => Hence a memory upgrade will fix it?
And is it possible that a router that was operating successfully non stop for over 5 years => suddenly crashes and reboots and stops of the flow of traffic because its BGP entries suddenly increased to a level it cannot handle (in one day)?
I will be doing some hardware upgrade but I would like to know for sure what caused the problem.
I think this can be achieved by using AS_PATH filters for accepting only routes originated from an ISP and it?s directly connected autonomous systems, instead of receiving the full BGP routing table from an ISP. The packet loss is only occurring once every 60 seconds during the exact time that the BGP Scanner process utilization goes up.
The full BGP table increases steadily (usually not suddenly), day by day, reflecting the increase in announced IPv4 address space out there. It is not expected that the BGP table will stop growing, as more and more enterprises get multihomed using PI addresses. So one day you will hit the max a router can handle. In the previous years the BGP table was well below 200000 entries not to say below 100000, depending on how far you go back in time.
You might ask your ISPs to do some aggregation for you and send a default route. This will however only solve the issue, if both ISPs aggregate in a similar way.
Tweaking memory consumption would only help for a short time, as BGP is the largest contributor in your case.
So the best approach would be a memory upgrade imho.
[toc:faq]The ProblemOn traditional switches whenever we have a trunk
interface we use the VLAN tag to demultiplex the VLANs. The switch needs
to determine which MAC Address table to look in for a forwarding
decision. To do this we require the switch to do...
[toc:faq]Introduction:Netdr is a tool available on a RSP720, Sup720 or
Sup32 that allows one to capture packets on the RP or SP inband. The
netdr command can be used to capture both Tx and Rx packets in the
software switching path. This is not a substitut...
IntroductionOSPF, being a link-state protocol, allows for every router
in the network to know of every link and OSPF speaker in the entire
network. From this picture each router independently runs the Shortest
Path First (SPF) algorithm to determine the b...