06-10-2012 08:11 PM - edited 03-07-2019 07:11 AM
Hello,
Our monitoring team reporting that one of the cisco module is down but i checked which is working fine and then i found some logs stated below:
Jun 11 02:21:47.270 GMT: %OIR-SP-3-PWRCYCLE: Card in module 2, is being power-cy
cled off (Module not responding to Keep Alive polling)
Can someone suggest what is a cause of this error and the appropriate solution.
(Note) this device is in Production network.
Thanks.
Solved! Go to Solution.
06-10-2012 09:08 PM
Hi Ray,
The message "(Module not responding to Keep Alive polling)" indicates that linecard reloaded due to loss of keepalives between module and supervisor. Since sup could not receive a reply on keepalives sent to the module, it reset the card to recover from problem condition.
Such failure could be caused by SW or HW issues, or loose seating of the card in the chassis slot. Could you collect and provide the following outputs, please?
show module
show diagn event
check for crashinfo files in "dir /all" and get their content with "more ..." command(of copy them using FTP/TFTP)
Kind Regards,
Ivan
**Please grade this post if you find it useful.
06-10-2012 08:29 PM
#sh version
Cisco Internetwork Operating System Software
IOS (tm) s3223_rp Software (s3223_rp-ENTSERVICESK9_WAN-M), Version 12.2(18)SXF10
, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2007 by cisco Systems, Inc.
Compiled Fri 13-Jul-07 03:27 by
Image text-base: 0x40101040, data-base: 0x42D54780
ROM: System Bootstrap, Version 12.2(17r)SX3, RELEASE SOFTWARE (fc1)
BOOTLDR: s3223_rp Software (s3223_rp-ENTSERVICESK9_WAN-M), Version 12.2(18)SXF10
, RELEASE SOFTWARE (fc1)
device_name_changed uptime is 8 weeks, 16 hours, 32 minutes
Time since device_name_changed switched to active is 8 weeks, 16 hours, 31 minutes
System returned to ROM by power cycle (SP by power on)
System restarted at 10:35:28 GMT Sun Apr 15 2012
System image file is "sup-bootdisk:s3223-entservicesk9_wan-mz.122-18.SXF10.bin"
A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html
If you require further assistance please contact us by sending email to
cisco WS-C6506-E (R7000) processor (revision 1.1) with 458752K/65536K bytes of m
emory.
Processor board ID SAL1133XZY5
R7000 CPU at 300Mhz, Implementation 0x27, Rev 3.3, 256KB L2, 1024KB L3 Cache
Last reset from power-on
SuperLAT software (copyright 1990 by Meridian Technology Corp).
X.25 software, Version 3.0.0.
Bridging software.
TN3270 Emulation software.
3 Virtual Ethernet/IEEE 802.3 interfaces
240 FastEthernet/IEEE 802.3 interfaces
9 Gigabit Ethernet/IEEE 802.3 interfaces
1915K bytes of non-volatile configuration memory.
65536K bytes of Flash internal SIMM (Sector size 512K).
Configuration register is 0x2102
06-10-2012 09:08 PM
Hi Ray,
The message "(Module not responding to Keep Alive polling)" indicates that linecard reloaded due to loss of keepalives between module and supervisor. Since sup could not receive a reply on keepalives sent to the module, it reset the card to recover from problem condition.
Such failure could be caused by SW or HW issues, or loose seating of the card in the chassis slot. Could you collect and provide the following outputs, please?
show module
show diagn event
check for crashinfo files in "dir /all" and get their content with "more ..." command(of copy them using FTP/TFTP)
Kind Regards,
Ivan
**Please grade this post if you find it useful.
06-10-2012 09:12 PM
Thanks Ivan for your instant support.
Please confirm the commands that you are suggesting to be executed, won't make an impact on production services or it will not create any outage...
I'm very consious here -nothing wrong to be happened.
Thanks.
06-10-2012 09:24 PM
Welcome, Ray. As for the commands, those are short command of "show" type, so they don't cause any impact on services.
Looking at the outputs below, it looks like module is working fine and diagnostics did not detect any HW test failure for it.
Could you check for crashinfp files in the switch file systems. e.g. bootflash? Use "show file sys" to get the full list of file systems attached.
06-10-2012 09:29 PM
See below:
device_name_changed#sh file systems
File Systems:
Size(b) Free(b) Type Flags Prefixes
- - opaque rw system:
- - opaque rw tmpsys:
- - opaque ro flexwan-fpd:
65536000 65259164 flash rw bootflash:
* 255877120 255877120 disk rw disk0:
255938560 206774272 disk rw sup-bootdisk:
46267964 0 opaque ro sup-microcode:
0 164072736 opaque wo sup-image:
126956 126280 nvram rw const_nvram:
1961976 1850885 nvram rw nvram:
- - opaque rw null:
- - opaque ro tar:
- - network rw tftp:
- - opaque wo syslog:
- - network rw rcp:
- - network rw ftp:
- - network rw scp:
- - opaque ro cns:
06-10-2012 09:32 PM
device_name_chnaged#dir /all bootflash:crashinfo_20071126-204837
Directory of bootflash:/crashinfo_20071126-204837
1 -rw- 276706 Nov 26 2007 20:48:37 +00:00 crashinfo_20071126-204837
06-10-2012 09:45 PM
Is that the only crashinfo file in the "show bootflash:" output? If so, it is irrelevant - the file is from 2007.
In that case we have only one symptom - keepalives loss.
Such issues are caused by one of the following reasons:
- module HW failure (looses or corrupts part of the packets or goes offline)
- loose connection to the backplane of the chassis
- supervisor failure (can't keep the keepalive process running)
HW issues could be transient, e.g. data parity errors on the ASICs.
I suggest to monitor the module and physically reseat (pull out, firmly push back) should the issue reoccur. if that does not help, replace the module.
06-10-2012 09:20 PM
Device_Name_Changed#sh diagnostic events module 2
Diagnostic events (storage for 500 events, 13 events recorded)
Event Type (ET): I - Info, W - Warning, E - Error
Time Stamp ET [Card] Event Message
------------------ -- ------ --------------------------------------------------
04/15 10:37:04.536 I [2] Diagnostics Passed
05/10 06:23:39.861 I [2] Diagnostics Passed
05/11 02:43:17.606 I [2] Diagnostics Passed
05/13 10:21:35.382 I [2] Diagnostics Passed
06/06 11:13:20.442 I [2] Diagnostics Passed
06/09 04:20:26.523 I [2] Diagnostics Passed
06/10 01:58:47.753 I [2] Diagnostics Passed
06/11 02:22:54.210 I [2] Diagnostics Passed
Device_Name_Changed#sh module 2
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
2 48 48 port 10/100 mb RJ21 ethernet WS-X6148-21AF SAL1126T2VM
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
2 001b.d4e4.2830 to 001b.d4e4.285f 1.2 5.4(2) 8.5(0.46)RFW Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
2 IEEE Voice Daughter Card WS-F6K-FE48-AF SAL1126T096 1.8 Ok
Mod Online Diag Status
---- -------------------
2 Pass
Device_Name_Changed#dir /all
Directory of disk0:/
No files in directory
255877120 bytes total (255877120 bytes free)
Device_Name_Changed#
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: