03-31-2009 05:44 AM - edited 03-04-2019 04:10 AM
I have two 1900 series routers in production for the last two years.
The routers have 2 interfaces LAN & WAN. HSRP is enabled on LAN interface with link monitoring for WAN interface. Routers were tested for failover before putting into production and worked fine.
Today suddenly the active HSRP router got hung and I could not connect to it remotely. Surprisingly the standby router did not become active. Since this was at the remote location, I asked the remote staff to shutdown the faulty router . Then HSRP switchover took place. After 10 minutes the faulty router was powered on and became active again.
I have no logs on syslog server to identify the issue. How can I pinpoint this issue? It seems like when the router got hung it did not give up its HSRP priority.
03-31-2009 05:55 AM
avil,
Seems they are still sending/receiving the hello packets while one of them got hang.
Maybe this link can help you some.
http://www.cisco.com/en/US/tech/tk648/tk362/technologies_tech_note09186a0080094afd.shtml
Let's take a look at the HSRP timer section.
Just want to ask you questions,When the router got hang. It cannot forward anything. Right? You cannot telnet to it. Can you ping it?
HTH,
Toshi
03-31-2009 05:59 AM
When the router got hung,I couldnot telnet. I have one question on HSRP priority.
I have set priority as 80 on standy by router, so when the other router (active) gets hung by what value it decrements the priority?
03-31-2009 06:06 AM
Avil,
Well, On the standby router you have set priority as 80. It will take affect when the active router is gone. Let's the standby router lost all hello messages within the time you configured. It will promote itself to be an active router.
So in your case you don't have any tracking applied in on the active router. Priority you mention will come into play when they first elect who will be an active router and when the active router is alive again after refreshing something on it. (grin)
HTH,
Toshi
03-31-2009 06:14 AM
so how do I correct this problem? Shall I set the priority on standby router as 95?
03-31-2009 06:42 AM
Avil,
I'm afraid that changing the priority will not solve the problem. The first thing you should do is that you have to log on the standby router when the active got hang. Using the a "show standby" command to see what is going on. When the active router got hang can you do a ping command from the standby router to the active router?. I mean to test connections between the segment. I'm not sure that the router you cannot telnet to. It actually doesn't forward anything. Or they are still sending/receiving hsrp packets.
Note: Don't forget the link I provided.
Toshi
03-31-2009 07:11 AM
From the standby router I could ping to the hung router LAN interface, but I could not telnet. I do not have the output of the show standby command. I feel its safe to set the prioroty of standby router to 90 and leave the priority on active router to default.
03-31-2009 08:37 AM
What may have happened, part of the hung router was still functioning and part wasn't. This happens very rarely, but when it does, you can enounter strange situtations.
What sometimes helps to avoid this, is running a later release of the same version, e.g. 12.2(4) vs. 12.2(18).
In later IOS versions, there's additional features to define some self monitoring although it can quickly become complex and I don't think it will guarantee 100% problem avoidance.
04-07-2009 09:46 PM
Today again the active router got hung but the standby router did not take over. So I telneted into the standby router and from there I could telnet into hanging router and rebooted the router. During reboot the standby router became active. I also took the log before the reboot. Kindly find the attached file. I could not find any useful information form the log.
04-07-2009 09:55 PM
Both have "Preemption enabled"?
04-07-2009 10:25 PM
Yes, the command is enabled on both the routers. Is it causing the problem?
04-07-2009 10:54 PM
"%PQ3_TSEC-5-LATECOLL: PQ3/FE(1), Late collision" --
Error Message
%PQ3_FE-5-LATECOLL : PQ3/FE([dec]/[dec]), Late collision
Explanation Late collisions occurred on the Fast Ethernet interface.
Recommended Action If the interface is Fast Ethernet, verify that both peers are in the same duplex mode. Otherwise, no action is required.
04-08-2009 03:46 AM
This log I have been seeing for a very long time and its just a notice.
04-08-2009 03:40 PM
What I also found is that when the router hangs, I execute the command show ip interface brief. The output shows both LAN & WAN interface as up but I am not able to ping the WAN interface from the hung router itself.
My current image is Cisco IOS Software, C181X Software (C181X-ADVIPSERVICESK9-M), Version 12.4(11)T2, RELEASE SOFTWARE (fc4).
Can I upgrade it to c181x-advipservicesk9-mz.124-24.T.bin
04-08-2009 07:38 PM
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide