12-12-2008 08:15 AM - edited 03-06-2019 02:56 AM
Hi,
Do you guys also experience this? We are having HSRP flaps everday and
we don't know the cause. Our core is a 4506 switch running Supervisor
Engine IV and the trunk of the two core switches is port-channel using
Gi1/1 and Gi1/2. The HSRP timers were also reduced to 250ms and dead
time of 750ms. I just do not know why I always get this. Thanks.
Dec 12 11:58:37 10.196.0.2 347762: Dec 12 03:46:13.247:
%HSRP-6-STATECHANGE: Vlan25 Grp 25 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347763: Dec 12 03:46:13.251:
%HSRP-6-STATECHANGE: Vlan36 Grp 36 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347764: Dec 12 03:46:13.251:
%HSRP-6-STATECHANGE: Vlan25 Grp 25 state Active -> Speak
Dec 12 11:58:37 10.196.0.2 347765: Dec 12 03:46:13.251:
%HSRP-6-STATECHANGE: Vlan36 Grp 36 state Active -> Speak
Dec 12 11:58:37 10.196.0.2 347766: Dec 12 03:46:13.899:
%HSRP-6-STATECHANGE: Vlan40 Grp 40 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347767: Dec 12 03:46:13.899:
%HSRP-6-STATECHANGE: Vlan33 Grp 33 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347768: Dec 12 03:46:13.903:
%HSRP-6-STATECHANGE: Vlan47 Grp 46 state Standby -> Active
Dec 12 11:58:37 10.196.0.2 347769: Dec 12 03:46:13.903:
%HSRP-6-STATECHANGE: Vlan40 Grp 40 state Active -> Speak
Dec 12 11:58:38 10.196.0.2 347770: Dec 12 03:46:13.903:
%HSRP-6-STATECHANGE: Vlan33 Grp 33 state Active -> Speak
Dec 12 11:58:38 10.196.0.2 347771: Dec 12 03:46:13.907:
%HSRP-6-STATECHANGE: Vlan47 Grp 46 state Active -> Speak
root@PHMNL6LINUX:/var/log#
John
12-12-2008 08:39 AM
John,
I would say that your HSRP timers are too low which are causing the frequent HSRP state changes. Do you have any error on any of the interfaces that are a member of the portchannel?
HTH,
Mark
12-12-2008 08:46 AM
Yes that's what I'm thinking too right now. I'm not seeing any errors on both of the Gig ports and on the Etherchannel but I think with a steady 70% CPU load, it can definitely mis-process an HSRP packet. We are also using this kind of setup but with a 6500 switch and our other 4500 switches were configured to default. Only these switches have this issue.
Maybe I will try to increase the HSRP timer of one VLAN and see if that will keep on flapping.
John
12-12-2008 08:54 AM
John,
I think that is a good idea to test, and implement if does help. I'm guessing with the timers that low if an HSRP hello is missed three times and changes states. Please keep us informed of the results of your tests.
Mark
12-12-2008 08:59 AM
also just addtional to marks comment, the error messages signify the loss of HSRP hellos between the peers. When you troubleshoot this issue, you must verify the communication between the HSRP peers. A random, momentary loss of data communication between the peers is the most common problem that results in these messages. HSRP state changes are often due to High CPU Utilization. If the error message is due to high CPU utilization, put a sniffer on the network and the trace the system that causes the high CPU utilization.
There are several possible causes for the loss of HSRP packets between the peers. The most common problems are physical layer problems or excessive network traffic caused by spanning tree issues.
Francisco
12-12-2008 09:08 AM
Thank you very much for your input guys. I will give feedback maybe after few days and see if the increase in timer will solve the issue. Currently, I increased the HSRP timers of one VLAN to 500ms and hold-time value of 1500ms. I really just cannot increase the timer value to a higher one especially the hold-time higher than 3sec because of the voice traffic. There are hundreds of VoIP users and network changes should be detected in less than 3 seconds. So I'm giving the HSRP lead time to switch in less than 2 seconds plus the possible routing protocol convergence.
Hopefully this one is not an STP issue. We use RSTP by the way and I think we configured it correctly by setting the core1 as the root and core2 as secondary.
John
12-12-2008 09:27 AM
Few days back somebody suggested HSRP BFD feature which is useful when hello and hold time is in millisecond.
check this link it talks about it & can we useful in ur case. There was other link which talk about in detal.If i find will post it.
12-12-2008 12:44 PM
Hey man,
Thanks for the link. This is really informational. Unfortunately my 4500 doesn't support this. But this is good. Thanks again. :(
For Cisco IOS Release 12.4(11)T, the Cisco implementation of BFD introduced support for the Hot Standby Router Protocol (HSRP). BFD support is not available for all platforms and interfaces. In Cisco IOS Release 12.4(11)T, this feature was introduced on Cisco 7200 series, Cisco 7600 series, and Cisco 12000 series routers.
John
12-12-2008 01:19 PM
What you might also try, instead of the "traditional" 1:3 ratio, try perhaps 250/1000 or 250/1250 ms. Keeps the lost time down a bit (rather than the 500/1500 you plan to try), yet supports more hello misses.
12-13-2008 11:45 AM
Thanks man. This is a pretty good idea. Maybe I can try this one. I just do not know if there will be difference with regards to CPU utilization if I increase the hello time from 250ms to 500ms.
But this is a good one. Thanks.
11-20-2009 01:47 AM
John,
You might want to have a look at this
Troubleshooting HSRP state changes
Thanks,
Sunil
11-20-2009 02:29 AM
Hello Sunil,
after one year John should have solved this issue.
Best Regards
Giuseppe
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide