×

Warning message

  • Cisco Support Forums is in Read Only mode while the site is being migrated.
  • Cisco Support Forums is in Read Only mode while the site is being migrated.

tuned HSRP-timers - "state-flapping" on the standby router

Unanswered Question
Oct 4th, 2006
User Badges:

hi,


we've implemented together with one of our customers hsrp with tuned timers for "faster-than-default" hsrp convergence.


The hsrp pair consists of two cisco 7600 routers, either equiped with a sup720. Both Routers have the full internet routing table loaded

(they are Provider-Edge routers).


we use following timer settings:


standby 31 timers msec 50 msec 500

Initially everything seems to work fine, but after running this configuration for some time we can see a "state flapping" on the standby

router.


Oct 2 21:10:10.655: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Standby -> Active

Oct 2 21:10:10.655: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Standby -> Active

Oct 2 21:10:10.687: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Active -> Speak

Oct 2 21:10:10.695: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Active -> Speak


it seems that the standby router doesn't get the hello packet within the holdtime of 500msecs and assumes that the hsrp active is down (and than

change to active). after additional ~300msec (see log output) the former standby again receives a hello from the "true" active router.

(-> the active router never changes its sate to standby)


so for me the standby router doesn't receive hellos from the active router for 800msec (500msecs holdtime + 300msec from the log output). we can

eliminate the possibility that all the hellos within the 800msec are dropped on the cross-link between the c76, because the utilization of this

link is about 5%. for me it seems that the active router doesn't send hsrp hellos for 800msecs.


has anyone experience with tuning hsrp timers? what about our setting with hello 50 / hold 500 msecs? are these timers too aggressive?


i've read in the docu that if the holdtime values is less than 250 milliseconds, a Cisco 7200 platforms or better should be used. As we use a

Cisco 7600, this recommandation is over-fulfilled (?)


i already wrote, these routers have the full internet routing table loaded. can this maybe the problem in our case?


it's very interesting, that our customer has the same hsrp configuration as showed above on other c7600s which doesn't have the full routing

tabel loaded - there is no problem with the state-flapping.


i would be thankful for every advise you give,


kind regards,


Bernhard



  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
danielmassey Wed, 10/04/2006 - 06:48
User Badges:

Hi,


These error messages describe a situation in which a standby HSRP router did not receive three successive HSRP hello packets from its HSRP peer. The output shows that the standby router moves from the standby state to the active state. Shortly thereafter, the router returns to the standby state. Unless this error message occurs during the initial installation, an HSRP issue probably does not cause the error message. The error messages signify the loss of HSRP hellos between the peers. When you troubleshoot this issue, you must verify the communication between the HSRP peers. A random, momentary loss of data communication between the peers is the most common problem that results in these messages.


There are several possible causes for the loss of HSRP packets between the peers. The most common problems are physical layer problems or excessive network traffic caused by spanning tree issues.You can use an access list in order to prevent the active router from receiving its own multicast hello packet.



mogli Thu, 10/05/2006 - 00:21
User Badges:

hi daniel,


thanks for your explanation. as i wrote, we know that the standby router doesn't receive a hello messages within the holdtime. layer-1 problems can be eliminated.


maybe i forgot to say explicitly that the state-flapping is not an one-time problem.

since enabling the tuned timers the problem occurs intermittently (on the routers with the full internet routing table).


so my question points more in the direction, if someone of you has experience using hsrp timers in the msec-area. maybe actually implemeted on routers with similar utilisation like a provider-edge router.


are there some good "hsrp-tuning" references from cisco?


kind regards,

mheusinger Thu, 10/05/2006 - 00:27
User Badges:
  • Green, 3000 points or more

Hi,


what is the CPU load looking like? BGP Scanner and BGP router will use quite some CPU resources and maybe a load of BGP updates to process postpones the HSRP hello sending.

Both processes are done by the same CPU and afaik BGP router has precedence over some other processes.


Regards, Martin

mogli Thu, 10/05/2006 - 02:18
User Badges:

hi,


i just double-checked the cpu-load during the state-flapping intervals.


in the "show proc cpu hist" output i can't see a higher-than-average utilisation of the processor.


the question is, if it's actually possible the so see cpu spikes that last for such a short time as we find here:


Oct 2 17:20:30.741: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Standby -> Active

Oct 2 17:20:30.741: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Standby -> Active

Oct 2 17:20:30.773: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Active -> Speak

Oct 2 17:20:30.773: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Active -> Speak

-> DELTA: 500 msec + (.773 - .741)msec = 532 msec


Oct 2 17:20:32.869: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Standby -> Active

Oct 2 17:20:32.869: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Standby -> Active

Oct 2 17:20:32.905: %STANDBY-6-STATECHANGE: Vlan357 Group 57 state Active -> Speak

Oct 2 17:20:32.905: %STANDBY-6-STATECHANGE: Vlan351 Group 51 state Active -> Speak

-> DELTA: 500 + 36 = 536 msec


In which interval does the "show proc cpu" polling occur? if intervall would be 1sec, should i see a high cpu-usage which lasts for only about 530msecs?


apart from these values, i read the recommendation to configure to holdtime (in our case 500msec) greater than or equal to three times the value of hellotime (50msec).

as you can see we have factor 10 between these values.


as far as i can assess, a c7600 should have enough horse power to handle such a scenario like our's?


what do you think about that?


thank's a lot &


kind regards,


Bernhard



Actions

This Discussion