Yesterday we encountered a very strange problem with a small multihomed AS of one of our customers:
The AS includes our two routers
Cisco 7206VXR NPE-400, c7200-ik9s-mz.124-25c.bin
Cisco 2811, c2800nm-spservicesk9-mz.124-18.bin
Both of the routers are connected to different upstreams in different ASs with according eBGP of course. The two upstreams are completely separated not sharing any infrastructure.
Internally they have an iBGP session configured and are connected via two switches to the customers LAN with a HSRP configuration.
The wohle setup covers connection losses to either one of the upstreams or hardware failures very well.
But yesterday we had the following situation:
Both of the routers lost their eBGP-sessions almost simultaneously and never brought them up again.
Further investigation showed that both upstream links lost layer 3 connectivity completely. While layer 2 was perfectly OK and CDP showed all the neighbors correctly we were not able to send a single ping across the upstream links.
Apart from this both of the routers appeared in a perfectly normal state without CPU or memory issues or any clue in the logging.
Shutting and reactivating the according interfaces did nothing to resolve this situation.
Only after a reboot of both machines the situation reverted back to normal.
Do any of the experts here have any idea what might have been going on here?
Re: Strange loss of eBGP sessions and IP connectivity
As the routers are after reboot now, it is quite difficult to pinpoint the exact cause of your problems so we can only hypothesize.
One of my thoughts goes to inconsistency between the routing table and the CEF. I have sometimes observed a situation where the CEF became de-synced from the routing table (it essentially contained different information than the RIB). This has resulted in packets being forwarded onto incorrect paths, creating reachability issues. This problems could be corrected either by clearing the CEF FIB and ADJ databases, or by deactivating and reactivating the CEF.
My second thought goes to actually gathering all possible data about the state of routers in the moment of the problem. At least, the show ip route, show ip interface brief, show ip arp, show ip bgp neighbor and similar commands displaying the IP control plane are of utmost importance (ideally, the show tech-support would do). Also, it would be interesting to perform the debug ip packet with an appropriate ACL to see the origination and processing of TCP segments carrying BGP messages - whether they are originated at all, whether they are forwarded through correct interfaces, and so on.
As there are many possibilities of what could go wrong and probably no further data to support or decline any hypothesis, I suggest being simply prepared to gather the output of the show tech-support command as soon as this problem ensues again, and then proceed from that.
Hi everyone, I would like to thank you in advance for any help you can provide a newcomer like myself!
Im studying the 100-105 book by Odom and am currently on the topic of Port security. I purchased a used 2960 and I'm trying to follow a...
While deploying a number of 18xx/2802/3802 model access points (APs), which run AP-COS as their operating platform. It can be observed on some occasions that while many of their access points were able to join the fabric WLC withou...
I am going to design and build an LAN network under a tunnel underground with long distance between the switches.
I will have 2 Catalyst switches and 8 Industrial IE3000, and they will be connected with fiber.
For now I am planning on use Layer-2 s...