Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Community Member

tuned HSRP-timers - "state-flapping" on the standby router

hi,

we've implemented together with one of our customers hsrp with tuned timers for "faster-than-default" hsrp convergence.

The hsrp pair consists of two cisco 7600 routers, either equiped with a sup720. Both Routers have the full internet routing table loaded

(they are Provider-Edge routers).

we use following timer settings:

standby 31 timers msec 50 msec 500

Initially everything seems to work fine, but after running this configuration for some time we can see a "state flapping" on the standby

router.

Oct 2 21:10:10.655: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Standby -> Active

Oct 2 21:10:10.655: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Standby -> Active

Oct 2 21:10:10.687: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Active -> Speak

Oct 2 21:10:10.695: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Active -> Speak

it seems that the standby router doesn't get the hello packet within the holdtime of 500msecs and assumes that the hsrp active is down (and than

change to active). after additional ~300msec (see log output) the former standby again receives a hello from the "true" active router.

(-> the active router never changes its sate to standby)

so for me the standby router doesn't receive hellos from the active router for 800msec (500msecs holdtime + 300msec from the log output). we can

eliminate the possibility that all the hellos within the 800msec are dropped on the cross-link between the c76, because the utilization of this

link is about 5%. for me it seems that the active router doesn't send hsrp hellos for 800msecs.

has anyone experience with tuning hsrp timers? what about our setting with hello 50 / hold 500 msecs? are these timers too aggressive?

i've read in the docu that if the holdtime values is less than 250 milliseconds, a Cisco 7200 platforms or better should be used. As we use a

Cisco 7600, this recommandation is over-fulfilled (?)

i already wrote, these routers have the full internet routing table loaded. can this maybe the problem in our case?

it's very interesting, that our customer has the same hsrp configuration as showed above on other c7600s which doesn't have the full routing

tabel loaded - there is no problem with the state-flapping.

i would be thankful for every advise you give,

kind regards,

Bernhard

4 REPLIES
Community Member

Re: tuned HSRP-timers - "state-flapping" on the standby router

Hi,

These error messages describe a situation in which a standby HSRP router did not receive three successive HSRP hello packets from its HSRP peer. The output shows that the standby router moves from the standby state to the active state. Shortly thereafter, the router returns to the standby state. Unless this error message occurs during the initial installation, an HSRP issue probably does not cause the error message. The error messages signify the loss of HSRP hellos between the peers. When you troubleshoot this issue, you must verify the communication between the HSRP peers. A random, momentary loss of data communication between the peers is the most common problem that results in these messages.

There are several possible causes for the loss of HSRP packets between the peers. The most common problems are physical layer problems or excessive network traffic caused by spanning tree issues.You can use an access list in order to prevent the active router from receiving its own multicast hello packet.

Community Member

Re: tuned HSRP-timers - "state-flapping" on the standby router

hi daniel,

thanks for your explanation. as i wrote, we know that the standby router doesn't receive a hello messages within the holdtime. layer-1 problems can be eliminated.

maybe i forgot to say explicitly that the state-flapping is not an one-time problem.

since enabling the tuned timers the problem occurs intermittently (on the routers with the full internet routing table).

so my question points more in the direction, if someone of you has experience using hsrp timers in the msec-area. maybe actually implemeted on routers with similar utilisation like a provider-edge router.

are there some good "hsrp-tuning" references from cisco?

kind regards,

Re: tuned HSRP-timers - "state-flapping" on the standby router

Hi,

what is the CPU load looking like? BGP Scanner and BGP router will use quite some CPU resources and maybe a load of BGP updates to process postpones the HSRP hello sending.

Both processes are done by the same CPU and afaik BGP router has precedence over some other processes.

Regards, Martin

Community Member

Re: tuned HSRP-timers - "state-flapping" on the standby router

hi,

i just double-checked the cpu-load during the state-flapping intervals.

in the "show proc cpu hist" output i can't see a higher-than-average utilisation of the processor.

the question is, if it's actually possible the so see cpu spikes that last for such a short time as we find here:

Oct 2 17:20:30.741: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Standby -> Active

Oct 2 17:20:30.741: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Standby -> Active

Oct 2 17:20:30.773: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Active -> Speak

Oct 2 17:20:30.773: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Active -> Speak

-> DELTA: 500 msec + (.773 - .741)msec = 532 msec

Oct 2 17:20:32.869: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Standby -> Active

Oct 2 17:20:32.869: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Standby -> Active

Oct 2 17:20:32.905: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Active -> Speak

Oct 2 17:20:32.905: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Active -> Speak

-> DELTA: 500 + 36 = 536 msec

In which interval does the "show proc cpu" polling occur? if intervall would be 1sec, should i see a high cpu-usage which lasts for only about 530msecs?

apart from these values, i read the recommendation to configure to holdtime (in our case 500msec) greater than or equal to three times the value of hellotime (50msec).

as you can see we have factor 10 between these values.

as far as i can assess, a c7600 should have enough horse power to handle such a scenario like our's?

what do you think about that?

thank's a lot &

kind regards,

Bernhard

703
Views
0
Helpful
4
Replies
CreatePlease to create content