Do you guys also experience this? We are having HSRP flaps everday and
we don't know the cause. Our core is a 4506 switch running Supervisor
Engine IV and the trunk of the two core switches is port-channel using
Gi1/1 and Gi1/2. The HSRP timers were also reduced to 250ms and dead
time of 750ms. I just do not know why I always get this. Thanks.
Dec 12 11:58:37 10.196.0.2 347762: Dec 12 03:46:13.247:
%HSRP-6-STATECHANGE: Vlan25 Grp 25 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347763: Dec 12 03:46:13.251:
%HSRP-6-STATECHANGE: Vlan36 Grp 36 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347764: Dec 12 03:46:13.251:
%HSRP-6-STATECHANGE: Vlan25 Grp 25 state Active -> Speak
Dec 12 11:58:37 10.196.0.2 347765: Dec 12 03:46:13.251:
%HSRP-6-STATECHANGE: Vlan36 Grp 36 state Active -> Speak
Dec 12 11:58:37 10.196.0.2 347766: Dec 12 03:46:13.899:
%HSRP-6-STATECHANGE: Vlan40 Grp 40 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347767: Dec 12 03:46:13.899:
%HSRP-6-STATECHANGE: Vlan33 Grp 33 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347768: Dec 12 03:46:13.903:
%HSRP-6-STATECHANGE: Vlan47 Grp 46 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347769: Dec 12 03:46:13.903:
%HSRP-6-STATECHANGE: Vlan40 Grp 40 state Active -> Speak
Dec 12 11:58:38 10.196.0.2 347770: Dec 12 03:46:13.903:
%HSRP-6-STATECHANGE: Vlan33 Grp 33 state Active -> Speak
Dec 12 11:58:38 10.196.0.2 347771: Dec 12 03:46:13.907:
%HSRP-6-STATECHANGE: Vlan47 Grp 46 state Active -> Speak
I would say that your HSRP timers are too low which are causing the frequent HSRP state changes. Do you have any error on any of the interfaces that are a member of the portchannel?
Yes that's what I'm thinking too right now. I'm not seeing any errors on both of the Gig ports and on the Etherchannel but I think with a steady 70% CPU load, it can definitely mis-process an HSRP packet. We are also using this kind of setup but with a 6500 switch and our other 4500 switches were configured to default. Only these switches have this issue.
Maybe I will try to increase the HSRP timer of one VLAN and see if that will keep on flapping.
I think that is a good idea to test, and implement if does help. I'm guessing with the timers that low if an HSRP hello is missed three times and changes states. Please keep us informed of the results of your tests.
also just addtional to marks comment, the error messages signify the loss of HSRP hellos between the peers. When you troubleshoot this issue, you must verify the communication between the HSRP peers. A random, momentary loss of data communication between the peers is the most common problem that results in these messages. HSRP state changes are often due to High CPU Utilization. If the error message is due to high CPU utilization, put a sniffer on the network and the trace the system that causes the high CPU utilization.
There are several possible causes for the loss of HSRP packets between the peers. The most common problems are physical layer problems or excessive network traffic caused by spanning tree issues.
Thank you very much for your input guys. I will give feedback maybe after few days and see if the increase in timer will solve the issue. Currently, I increased the HSRP timers of one VLAN to 500ms and hold-time value of 1500ms. I really just cannot increase the timer value to a higher one especially the hold-time higher than 3sec because of the voice traffic. There are hundreds of VoIP users and network changes should be detected in less than 3 seconds. So I'm giving the HSRP lead time to switch in less than 2 seconds plus the possible routing protocol convergence.
Hopefully this one is not an STP issue. We use RSTP by the way and I think we configured it correctly by setting the core1 as the root and core2 as secondary.
Few days back somebody suggested HSRP BFD feature which is useful when hello and hold time is in millisecond.
check this link it talks about it & can we useful in ur case. There was other link which talk about in detal.If i find will post it.
Thanks for the link. This is really informational. Unfortunately my 4500 doesn't support this. But this is good. Thanks again. :(
For Cisco IOS Release 12.4(11)T, the Cisco implementation of BFD introduced support for the Hot Standby Router Protocol (HSRP). BFD support is not available for all platforms and interfaces. In Cisco IOS Release 12.4(11)T, this feature was introduced on Cisco 7200 series, Cisco 7600 series, and Cisco 12000 series routers.
What you might also try, instead of the "traditional" 1:3 ratio, try perhaps 250/1000 or 250/1250 ms. Keeps the lost time down a bit (rather than the 500/1500 you plan to try), yet supports more hello misses.
Thanks man. This is a pretty good idea. Maybe I can try this one. I just do not know if there will be difference with regards to CPU utilization if I increase the hello time from 250ms to 500ms.
But this is a good one. Thanks.