I have 2 core routers(6509 MSFC) keep flip-flopping between the active and standby router(HSRP). My MRTG is showing that the EIGRP hold time expired and all the Vlan interfaces swap to the standby router and after a few minutes the active router kicks in to be active. This happens quite frequently since this morning. Please advice.
Both the EIGRP and HSRP hold times are expiring. Make sure that each MSFC is set to the same timer values in both HSRP and in EIGRP. I would also check for problems in the trunk between the two switches. See if either switch reports the trunk going down in the logs. Also, see if the trunk is being over-utilized, thus keeping keep-alives from getting across.
I am assuming that nothing in the configuration of the MSFCs has changed; since you are saying that the problem started this morning, you might be suffering a Denial of Service attack. Symptoms would indeed be routing peer loss due to hello packet drops and
HSRP peer loss due to hello packet drops. Check out this document:
Configuring Denial of Service Protection/CISCO CATALYST 6500 SERIES SWITCHES
Normally HSRP doesn't rely on EIGRP because HSRP interfaces belonging to an HSRP group reside in one subnet. But if you observe EIGRP hold time expirations that may be a sign for a layer-2 connectivity issue between to router's interfaces. Can you send sample interface configs from MSFC interfaces? and what you've got in logging buffers?
You are definitely having a connectivity issue, rather than and eigrp issue--eigrp and hsrp are completely unrelated, other than sending hello's and other information over the same link. There are several things to check here--start with the last interface state change, how long has it been since it last flapped? Does it happen to coincide with the last eigrp neighbor adjcency failure? What are input queue drops, and output queue drops? Are you dropping a lot of packets on either side, on the input or the output?
The problem is a layer 2 problem, someplace, now it's just a matter of finding it.
If two 6500s are directly connected you can check health of this line using sh interface command (you'd better "clear counters" before this and wait for problem to occur a few times to prevent wrong info). I think at last you'll find a faulty cable/connector or port.
It's kind of hard to say without knowing the topology, but I agree it looks like a layer 2 issue. Since it sounds like mulitple VLANs are affected, I'm thinking it would most likely be a core trunk connection where the issue is occuring (depending on the topology and how the vlans/trunking are configured). I would check any trunk connections between the core switches for errors and also check for spanning tree topology changes when the problem occurs (sh spantree stat ).
The input queue drops is about 9000, and output drops is 0. After I check the trunking info on all the switches, I found a switch which doesn not include the VLAN 100 on the allowed trunk, while other swicthes are included with this VLAN 100 on the trunk.
I clear the counter on the core and see what will happen after setting that vlan to the trunk.
Question We run asr9001 with XR 6.1.3, and we have a very long delay to
login w/ SSH 1 or 2 to the device compare to IOS device. After
investigation, the there is 1s delay between the client KEXDH_INIT and
the server (XR) KEXDH_REPLY. After debug ssh serv...
Introduction The purpose of this document is to demonstrate the Open
Shortest Path First (OSPF) behavior when the V-bit (Virtual-link bit) is
present in a non-backbone area. The V-bit is signaled in Type-1 LSA only
if the router is the endpoint of one or ...
Hi, I am seeing quite a few issues with patch install and wanted to
share my experience and workaround to this. Login to admin via CLI, then
access root with the “shell” command Issue “df –h” and you’ll probably
see the following directory full or nearly ...