HSRP routers occasionally becoming active-active

I am having a problem with HSRP routers occasionally becoming active-active.  

The site has two 1921 routers with HWIC-4ESW each and connect to Catalyst 2960 Access switches.  Router1 is normally the active router and Router2  the standby router.

Occasionally, Router1 loses connectivity to the access switch, so Router 2 does not get any HSRP response from Router 1 and becomes active router as well, creating an active-active scenario.

Router1 port Fa0/0/0 stays forwarding though it is unable to reach Access switch 1.  If I run show CDP neighbour, the router is unable to see the Switch1 (CDP is running on both router and switch). 

Access Switch1 port F0/1 is also forwarding but unable to ping router 1.  It can see Router 1 when I run show CDP neighbour.

If I bounce the ports, they will see each other for a about 45 seconds then drop the connection again.

The only way to restore the connection and fix the HSRP active-active router scenario is to reboot Router 1.

This happens is randomly in several branches with the same topology once to twice a month.  This configuration has worked for almost two years now without having this issue.  I am not sure if BGP has something to do with this as we has just started after recent implementation of BGP routing with our ISP


Recently we had a problem on our 4ESW as a result of IOS bug: HWIC was loosing all the VLANs even though they all were in configuration.

The only way to fix was to recreate vlans (L2, not SVI) or reload (unacceptable in my case).

So, could you check "sh ip int" during the issue and check your log.

PS: what is the IOS you are running?

I am using Version 15.1(4)M3.  I managed to log my session before I rebooted the router and found the interface up/up.

FastEthernet0/0/0 is up, line protocol is up 

  Hardware is Fast Ethernet, address is e8b7.48e1.10be (bia e8b7.48e1.10be)
  Description: **** Trunk to XX-SW1 F0/1 ****
  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 100Mb/s
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 09:04:20, output never, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 11000 bits/sec, 13 packets/sec
  5 minute output rate 3000 bits/sec, 5 packets/sec
     383627 packets input, 39462582 bytes, 0 no buffer
     Received 155314 broadcasts (216419 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     163681 packets output, 11787414 bytes, 0 underruns
     0 output errors, 0 collisions, 2 interface resets
     1 unknown protocol drops
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out



Check the status of your SVI interfaces (no physical one).

Please post "sh ip int br"  + "sh cdp ne" + sh span" (collected during the issue) here if you could.

Attached is the output of "sh ip int "  + "sh cdp ne" + sh span"  from a different router that had the same problem.

So, Router 1 loses connection to the access switch.  When I check the CDP neighbour, the router is unable to detect the switch.  The router port to the switch is up/up and spanning tree status is forwarding but unable to ping the access switch.

During times when the issue is happening, the access switch can sometimes see the router in cdp neighbor, the interface to the router is up/up and spanning tree status of the port is forwarding.

Sometimes when I reset the port on the router, the link will be re-established but it will go down again after about 45 seconds. 

Reloading the router1 will fix the problem but it appears that the HWIC-4ESW become dumb every few weeks.

If I show the arp on the router, it will say that the mac address to the switch is "incomplete"

All SVIs were up/up

Hello.In the log I see "BPDU:


In the log I see "BPDU: sent 74, received 0", that means that for some reason counters were cleared recently.

So, I would analyze logs (could you post them here) to see what happend at the time!

PS: as a workaround I would suggest to configure internal VLANs on L3 subinterfaces.

PS2: Please upgrade IOS to 15.2(4)M5.

