Watchdog timer expired

Unanswered Question
Nov 2nd, 2006
User Badges:

On Sunday morning @ 1.30am, up to 100 routers on our estate restarted with a watchdog timer expired fault. Each router has the same hardware build, IOS, cards and same base config apart from IP issues. No changes were made to the hardware/software on these routers and apart from the clock change( BST to GMT) on sunday mornning on the NTP Server, there were no config changes. Has anybody ever experienced this ?

Arklow>sh ver

Cisco Internetwork Operating System Software

IOS (tm) C2600 Software (C2600-IS-M), Version 12.0(11), RELEASE SOFTWARE (fc1)

Copyright (c) 1986-2000 by cisco Systems, Inc.

Compiled Sat 20-May-00 10:46 by htseng

Image text-base: 0x80008088, data-base: 0x808F19F4


ROM: System Bootstrap, Version 11.3(2)XA4, RELEASE SOFTWARE (fc1)


Arklow uptime is 4 days, 11 hours, 18 minutes

System restarted by watchdog timer expired at 04:21:59 GMT Sun Oct 29 2006

System image file is "flash:c2600-is-mz_120-11.bin"


cisco 2610 (MPC860) processor (revision 0x203) with 29696K/3072K bytes of memory.

Processor board ID JAD04200MRS (4097643662)

M860 processor: part number 0, mask 49

Bridging software.

X.25 software, Version 3.0.0.

Basic Rate ISDN software, Version 1.1.

1 Ethernet/IEEE 802.3 interface(s)

2 Serial(sync/async) network interface(s)

8 Low-speed serial(sync/async) network interface(s)

1 ISDN Basic Rate interface(s)

32K bytes of non-volatile configuration memory.

16384K bytes of processor board System flash (Read/Write)


Configuration register is 0x2102


Arklow>sh diag

Slot 0:

C2610 1E Mainboard port adapter, 4 ports

Port adapter is analyzed

Port adapter insertion time unknown

EEPROM contents at hardware discovery:

Hardware revision 2.3 Board revision C0

Serial number 4097643662 Part number 73-2839-13

Test history 0x0 RMA number 00-00-00

EEPROM format version 1

EEPROM contents (hex):

0x20: 01 91 02 03 F4 3D 14 8E 49 0B 17 0D 00 00 00 00

0x30: 60 57 92 06 00 00 00 00 00 00 00 00 00 00 00 00


WIC Slot 0:

Serial 2T (12in1) WAN daughter card

Hardware revision 1.0 Board revision C0

Serial number 19892278 Part number 800-03181-01

Test history 0x0 RMA number 00-00-00

Connector type PCI

EEPROM format version 1

EEPROM contents (hex):

0x20: 01 12 01 00 01 2F 88 36 50 0C 6D 01 00 00 00 00

0x30: 60 00 00 00 00 05 25 00 FF FF FF FF FF FF FF FF


WIC Slot 1:

BRI S/T - 2186 WAN daughter card

Hardware revision 1.3 Board revision A0

Serial number 19912911 Part number 800-01833-03

Test history 0x0 RMA number 00-00-00

Connector type Wan Module

EEPROM format version 1

EEPROM contents (hex):

0x20: 01 07 01 03 01 2F D8 CF 50 07 29 03 00 00 00 00

0x30: 50 00 00 00 00 04 27 01 FF FF FF FF FF FF FF FF


Slot 1:

Sync/Async port adapter, 8 ports

Port adapter is analyzed

Port adapter insertion time unknown

EEPROM contents at hardware discovery:

Hardware revision 1.0 Board revision H0

Serial number 19706768 Part number 800-01225-02

Test history 0x0 RMA number 00-00-00

EEPROM format version 1

EEPROM contents (hex):

0x20: 01 25 01 00 01 2C B3 90 50 04 C9 02 00 00 00 00

0x30: 88 52 69 00 00 05 02 17 FF FF FF FF FF FF FF FF


Arklow>en

Password:


  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
gpulos Thu, 11/02/2006 - 08:01
User Badges:
  • Blue, 1500 points or more

one of the first things you'll want to do is to determine which type of watchdog timeout was encountered.


there are two types of watchdog timeouts:

1) software watchdog timeout

http://cisco.com/en/US/products/sw/iosswrel/ps1835/products_tech_note09186a00800f67db.shtml#sw_watchdog


2) process watchdog timeout

http://cisco.com/en/US/products/sw/iosswrel/ps1835/products_tech_note09186a00800f67db.shtml#process_watchdog


please see the following link for more info on troubleshooting watchdog timeouts:

http://cisco.com/en/US/products/sw/iosswrel/ps1835/products_tech_note09186a00800f67db.shtml

maryodriscoll Thu, 11/02/2006 - 08:14
User Badges:

Apparently, its a software watchdog timeout issue. I have checked all the field notices. These routers are all 2610's and have a WIC-2T , BRI card and 8 Port async card. The only 2 field notices I could find were were Clock and timing problems but based on Cisco 2620 and Watchdog timeouts but only on 2600/3700 with AIM module which is not relevant so I'm a bit stumped....


One of my colleaques hers has questioned whether it would be anything to do with NTP ie. the clock changed on the NTP server on Sunday morning when we moved from BST to GMT but only 100 routers were not affected and not the entire estate. Note:- the entire estate is all Cisco 2610;;s with same IOS


smothuku Thu, 11/02/2006 - 08:02
User Badges:
  • Silver, 250 points or more


Hi ,


Can you send the show stack output ?


Thanks,

Satish

maryodriscoll Thu, 11/02/2006 - 08:07
User Badges:

Arklow>sh stacks

Minimum process stacks:

Free/Size Name

5624/6000 CDP Protocol

5096/6000 Router Init

9572/12000 Init

5364/6000 RADIUS INITCONFIG

5384/6000 DHCP Client

9208/12000 Virtual Exec


Interrupt level stacks:

Level Called Unused/Size Name

1 44568131 8188/9000 Network interfaces

2 0 9000/9000 Timebase Reference Interrupt

3 0 9000/9000 PA Management Int Handler

6 602 8924/9000 16552 Con/Aux Interrupt

7 96985592 8928/9000 MPC860 TIMER INTERRUPT


System was restarted by watchdog timer expired

C2600 Software (C2600-IS-M), Version 12.0(11), RELEASE SOFTWARE (fc1)

Compiled Sat 20-May-00 10:46 by htseng (current version)

Image text-base: 0x80008088, data-base: 0x808F19F4



Stack trace from system failure:

FP: 0x80FE2B80, RA: 0xF0010000

FP: 0x80FE2BE8, RA: 0x8020E7B0

FP: 0x80FE2BF8, RA: 0x80D20000

FP: 0x80FE2C30, RA: 0x8068B808

FP: 0x80FE2CF8, RA: 0x8067B6A4

FP: 0x80FE2D20, RA: 0x802252D8


smothuku Thu, 11/02/2006 - 08:08
User Badges:
  • Silver, 250 points or more


Hi ,


Can you check whether is there any crashinfo file using show dir bootflash:


Thanks,

Satish


smothuku Thu, 11/02/2006 - 08:40
User Badges:
  • Silver, 250 points or more


Hi Mary ,


I have decoded the stack trace and below is the decode..Still we are missing some thing..Is it possible to get the sh tech of router.Do you have any console logs ?


0000:etext(0x808f19f4)+0x6f71e60c

0x8020E7B0:free(0x8020e4e0)+0x2d0

0x80D20000:etext(0x808f19f4)+0x42e60c

0x8068B808:csm_house_keeping(0x8068b7cc)+0x3c

0x8067B6A4:dlsw_background(0x8067b600)+0xa4

0x802252D8:process_execute(0x80225194)+0x144


I also checked for known issues, but all known issues are saying that system is restarted by SevG error.If you get the show tech of router then it would be helpful.


One more thing have configured any thing related to DLSW ?


Thanks,

Satish


Thanks,

Satish

smothuku Fri, 11/03/2006 - 03:10
User Badges:
  • Silver, 250 points or more


Hi Mary ,


Thanks for your info.After decoding stack trace i understand that a Cisco 2600 series router may reload when running data-link switching (DLSw).This might problem with DLSw (suspecting).If you sh log of router can you paste it ?


One more thing have you seen any trace back in log file when you encountered this problem ?


Thanks,

satish

maryodriscoll Fri, 11/03/2006 - 04:14
User Badges:

Satish

Thanks. Unfortunately, the router reloaded so all I can give you is the log file I captured today. We do log all routers to a Syslog file but the Syslog server is currently off-line. So, do you think DLSW has something to do with this ?

Regards

Mary



Attachment: 
maryodriscoll Fri, 11/03/2006 - 04:15
User Badges:

Satish


One other thing, every router in this estate is configured with DLSW and peers with 2/3 different DLSW routers in Dublin.


Mary

smothuku Fri, 11/03/2006 - 04:31
User Badges:
  • Silver, 250 points or more


Hi Mary ,


Thanks for your info and reply.I am checking for known issues and if you get me the logs from syslog server before router was rebooted it would be helpful.We can see what was happened in the router before rebooting.


As soon as syslog server comes to online please send me the log.


Thanks,

Satish

Actions

This Discussion