We use Nagios to monitor our 4 core routers. 2 of which are our main BGP routers. We use Nagios to monitor these devices. Periodically we get an alarm in nagios stating its not-responding and within a few minutes another alert clearing the previous one stating all is ok. Everytime I log into the device I don't see anything going on. Its up and appears to be operational. The only thing I find is that the CPU seems to take a spike around the same time from the BGP Router process. We are recieving a total of 623763 prefixes from our bgp neighbors. Has anyone seen anything like this before? Could this be a problem with nagios or the BGP process? IT doesn't appear to be a memory issue as we appear to haver plenty of free memory. Any help would be great. Thanks!
BGP table version is 33649554, main routing table version 33649554 328279 network entries using 38408643 bytes of memory 623763 path entries using 32435676 bytes of memory 1136755/71241 BGP path/bestpath attribute entries using 181880800 bytes of memory 677951 BGP AS-PATH entries using 32045892 bytes of memory 4091 BGP community entries using 186952 bytes of memory 1 BGP extended community entries using 24 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 284957987 total bytes of memory 12064 received paths for inbound soft reconfiguration BGP activity 924004/593348 prefixes, 5688171/5062032 paths, scan interval 60 secs
If your BGP neighborships are not dropping and you have no problems with intermittent connectivity then i wouldn't worry about it to be honest. If the CPU is spiking it could just be normal BGP behaviour ie. the BGP scanner runs routintely.
If the CPU is handling the BGP scanner at the time your networking monitoring software is polling the device it may just be that the 6500 is too busy temporarily to answer.
If you CPU was spiking all the time and you were low on memory and traffic forwarding or route peerings/information were suffering then that would be something to worry about.
It could be that the BGP process has all the CPU at the time nagios sends a ping and the switch is to busy to respond. This occurring regularly would indicate an underlying problem. I suppose making the monitoring rule more tolerant to packet loss may be possible.
Take a close look at spanning-tree topology changes and IGP routing between the monitoring server and the target network device. If there is a L2 network involved enable spanning syslog traps to gain situational awareness. If there is a L3 network involved take a close look at the 'age' of the routes for both the monitoring server and the target network device.
Also consider possible congestion on key links between the monitoring server and the target network device. A queuing strategy may be helpful here.
Thank you for the responses. I also am afraid that its hinting at an underlying problem that I'm not seeing. We have another core BGP router that has even more BGP routes from peers and less free memory, but it is not having this problem. So I'm thinking that something else is going on that is causing this problem. It might not even be related to BGP but I just happen to see the BGP router processor spiking when I'm logging in. Thanks for the information. I will come up with a strategy to address this based on your input and hopefully resolve it. If not...I'll be back. Thanks!
Hi everyone, I would like to thank you in advance for any help you can provide a newcomer like myself!
Im studying the 100-105 book by Odom and am currently on the topic of Port security. I purchased a used 2960 and I'm trying to follow a...
While deploying a number of 18xx/2802/3802 model access points (APs), which run AP-COS as their operating platform. It can be observed on some occasions that while many of their access points were able to join the fabric WLC withou...
I am going to design and build an LAN network under a tunnel underground with long distance between the switches.
I will have 2 Catalyst switches and 8 Industrial IE3000, and they will be connected with fiber.
For now I am planning on use Layer-2 s...