cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
402
Views
0
Helpful
2
Replies

Strange loss of eBGP sessions and IP connectivity

grischast
Level 1
Level 1

Hi all

Yesterday we encountered a very strange problem with a small multihomed AS of one of our customers:

The AS includes our two routers

Cisco 7206VXR NPE-400, c7200-ik9s-mz.124-25c.bin

Cisco 2811, c2800nm-spservicesk9-mz.124-18.bin

Both of the routers are connected to different upstreams in different ASs with according eBGP of course. The two upstreams are completely separated not sharing any infrastructure.

Internally they have an iBGP session configured and are connected via two switches to the customers LAN with a HSRP configuration.

The wohle setup covers connection losses to either one of the upstreams or hardware failures very well.

But yesterday we had the following situation:

Both of the routers lost their eBGP-sessions almost simultaneously and never brought them up again.

Further investigation showed that both upstream links lost layer 3 connectivity completely. While layer 2 was perfectly OK and CDP showed all the neighbors correctly we were not able to send a single ping across the upstream links.

Apart from this both of the routers appeared in a perfectly normal state without CPU or memory issues or any clue in the logging.

Shutting and reactivating the according interfaces did nothing to resolve this situation.

Only after a reboot of both machines the situation reverted back to normal.

Do any of the experts here have any idea what might have been going on here?

Thanks in advance for any hint or idea.

Regards,

Grischa

2 Replies 2

Peter Paluch
Cisco Employee
Cisco Employee

Grischa,

As the routers are after reboot now, it is quite difficult to pinpoint the exact cause of your problems so we can only hypothesize.

One of my thoughts goes to inconsistency between the routing table and the CEF. I have sometimes observed a situation where the CEF became de-synced from the routing table (it essentially contained different information than the RIB). This has resulted in packets being forwarded onto incorrect paths, creating reachability issues. This problems could be corrected either by clearing the CEF FIB and ADJ databases, or by deactivating and reactivating the CEF.

My second thought goes to actually gathering all possible data about the state of routers in the moment of the problem. At least, the show ip route, show ip interface brief, show ip arp, show ip bgp neighbor and similar commands displaying the IP control plane are of utmost importance (ideally, the show tech-support would do). Also, it would be interesting to perform the debug ip packet with an appropriate ACL to see the origination and processing of TCP segments carrying BGP messages - whether they are originated at all, whether they are forwarded through correct interfaces, and so on.

As there are many possibilities of what could go wrong and probably no further data to support or decline any hypothesis, I suggest being simply prepared to gather the output of the show tech-support command as soon as this problem ensues again, and then proceed from that.

Best regards,

Peter

Hi Peter

Thanks for your suggestions. Of course we hope that this will never happen again. But if we encounter this problem once more we will save the tech-support data before restarting the machines.

Regards,

Grischa

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card