Re: HSRP State Change

jpl861 · ‎12-12-2008

Hi,

Do you guys also experience this? We are having HSRP flaps everday and

we don't know the cause. Our core is a 4506 switch running Supervisor

Engine IV and the trunk of the two core switches is port-channel using

Gi1/1 and Gi1/2. The HSRP timers were also reduced to 250ms and dead

time of 750ms. I just do not know why I always get this. Thanks.

Dec 12 11:58:37 10.196.0.2 347762: Dec 12 03:46:13.247:

%HSRP-6-STATECHANGE: Vlan25 Grp 25 state Standby -> Active

Dec 12 11:58:37 10.196.0.2 347763: Dec 12 03:46:13.251:

%HSRP-6-STATECHANGE: Vlan36 Grp 36 state Standby -> Active

Dec 12 11:58:37 10.196.0.2 347764: Dec 12 03:46:13.251:

%HSRP-6-STATECHANGE: Vlan25 Grp 25 state Active -> Speak

Dec 12 11:58:37 10.196.0.2 347765: Dec 12 03:46:13.251:

%HSRP-6-STATECHANGE: Vlan36 Grp 36 state Active -> Speak

Dec 12 11:58:37 10.196.0.2 347766: Dec 12 03:46:13.899:

%HSRP-6-STATECHANGE: Vlan40 Grp 40 state Standby -> Active

Dec 12 11:58:37 10.196.0.2 347767: Dec 12 03:46:13.899:

%HSRP-6-STATECHANGE: Vlan33 Grp 33 state Standby -> Active

Dec 12 11:58:37 10.196.0.2 347768: Dec 12 03:46:13.903:

%HSRP-6-STATECHANGE: Vlan47 Grp 46 state Standby -> Active

Dec 12 11:58:37 10.196.0.2 347769: Dec 12 03:46:13.903:

%HSRP-6-STATECHANGE: Vlan40 Grp 40 state Active -> Speak

Dec 12 11:58:38 10.196.0.2 347770: Dec 12 03:46:13.903:

%HSRP-6-STATECHANGE: Vlan33 Grp 33 state Active -> Speak

Dec 12 11:58:38 10.196.0.2 347771: Dec 12 03:46:13.907:

%HSRP-6-STATECHANGE: Vlan47 Grp 46 state Active -> Speak

root@PHMNL6LINUX:/var/log#

John

Mark Yeates · ‎12-12-2008

John,

I would say that your HSRP timers are too low which are causing the frequent HSRP state changes. Do you have any error on any of the interfaces that are a member of the portchannel?

HTH,

Mark

jpl861 · ‎12-12-2008

Yes that's what I'm thinking too right now. I'm not seeing any errors on both of the Gig ports and on the Etherchannel but I think with a steady 70% CPU load, it can definitely mis-process an HSRP packet. We are also using this kind of setup but with a 6500 switch and our other 4500 switches were configured to default. Only these switches have this issue.

Maybe I will try to increase the HSRP timer of one VLAN and see if that will keep on flapping.

John

Mark Yeates · ‎12-12-2008

John,

I think that is a good idea to test, and implement if does help. I'm guessing with the timers that low if an HSRP hello is missed three times and changes states. Please keep us informed of the results of your tests.

Mark

francisco_1 · ‎12-12-2008

also just addtional to marks comment, the error messages signify the loss of HSRP hellos between the peers. When you troubleshoot this issue, you must verify the communication between the HSRP peers. A random, momentary loss of data communication between the peers is the most common problem that results in these messages. HSRP state changes are often due to High CPU Utilization. If the error message is due to high CPU utilization, put a sniffer on the network and the trace the system that causes the high CPU utilization.

There are several possible causes for the loss of HSRP packets between the peers. The most common problems are physical layer problems or excessive network traffic caused by spanning tree issues.

Francisco

jpl861 · ‎12-12-2008

Thank you very much for your input guys. I will give feedback maybe after few days and see if the increase in timer will solve the issue. Currently, I increased the HSRP timers of one VLAN to 500ms and hold-time value of 1500ms. I really just cannot increase the timer value to a higher one especially the hold-time higher than 3sec because of the voice traffic. There are hundreds of VoIP users and network changes should be detected in less than 3 seconds. So I'm giving the HSRP lead time to switch in less than 2 seconds plus the possible routing protocol convergence.

Hopefully this one is not an STP issue. We use RSTP by the way and I think we configured it correctly by setting the core1 as the root and core2 as secondary.

John

viyuan700 · ‎12-12-2008

Few days back somebody suggested HSRP BFD feature which is useful when hello and hold time is in millisecond.

check this link it talks about it & can we useful in ur case. There was other link which talk about in detal.If i find will post it.

http://www.cisco.com/en/US/docs/ios/ipapp/configuration/guide/ipapp_hsrp_ps6350_TSD_Products_Configuration_Guide_Chapter.html#wp1054668

jpl861 · ‎12-12-2008

Hey man,

Thanks for the link. This is really informational. Unfortunately my 4500 doesn't support this. But this is good. Thanks again. :(

For Cisco IOS Release 12.4(11)T, the Cisco implementation of BFD introduced support for the Hot Standby Router Protocol (HSRP). BFD support is not available for all platforms and interfaces. In Cisco IOS Release 12.4(11)T, this feature was introduced on Cisco 7200 series, Cisco 7600 series, and Cisco 12000 series routers.

John

Joseph W. Doherty · ‎12-12-2008

What you might also try, instead of the "traditional" 1:3 ratio, try perhaps 250/1000 or 250/1250 ms. Keeps the lost time down a bit (rather than the 500/1500 you plan to try), yet supports more hello misses.

jpl861 · ‎12-13-2008

Thanks man. This is a pretty good idea. Maybe I can try this one. I just do not know if there will be difference with regards to CPU utilization if I increase the hello time from 250ms to 500ms.

But this is a good one. Thanks.

SunilKhanna · ‎11-20-2009

John,

You might want to have a look at this

Troubleshooting HSRP state changes

Thanks,

Sunil

Regards, Sunil Khanna

Giuseppe Larosa · ‎11-20-2009

Hello Sunil,

after one year John should have solved this issue.

Best Regards

Giuseppe