6506E w/Sup720 unexplained reboot

Answered Question
Apr 27th, 2010
User Badges:

Hi there,


I have a 6506E, running 12.2(33)SXI2, with the following hardware configuration:


Mod Ports Card Type                              Model            
--- ----- -------------------------------------- ------------------ -----------
  1    6  Firewall Module                        WS-SVC-FWM-1      
  2    4  SLB Application Processor Complex      WS-X6066-SLB-APC  
  3   24  CEF720 24 port 1000mb SFP              WS-X6724-SFP     
  4   48  CEF720 48 port 10/100/1000mb Ethernet  WS-X6748-GE-TX    
  5    8  Network Analysis Module                WS-SVC-NAM-2      
  6    2  Supervisor Engine 720 (Active)         WS-SUP720-3B    


This last weekend this switch rebooted, and I found this in the logs:


Apr 25 04:58:13 10.9.0.2 995: 001013: Apr 25 03:58:10.083 UTC: %SYS-SP-3-CPUHOG: Task is running for (2000)msecs, more than (2000)msecs (17/15),process = SP Error Detection Process.

Apr 25 04:58:14 10.9.0.2 996: 001014: Apr 25 03:58:10.331 UTC: %ERR_DET-SP-5-ERR_DET_LOW_MEM: Very low memory, dump debuginfo, local 692056832, io 32616

Apr 25 05:43:05 10.9.0.2 1001: 001021: Apr 25 04:43:03.812 UTC: %SYS-SP-2-MALLOCFAIL: Memory allocation of 468 bytes failed from 0x40C83B60, alignment 32
... (thousands of these messages)

Apr 25 22:19:42 10.9.0.2 2995: 003015: Apr 25 21:19:30.619 UTC: %SYS-SP-2-MALLOCFAIL: Memory allocation of 468 bytes failed from 0x40C83B60, alignment 32
Apr 25 22:20:30 10.9.0.2 2996: 003028: Apr 25 21:20:28.120 UTC: %RPC-2-FAILED_USERHANDLE: Failed to send RPC request sw_vlan_sp:sw_vlansp_get_4k_vlan_info
Apr 25 22:20:30 10.9.0.2 2997: -Traceback= 422CEDF4 41DB587C 41C93204 41C942B0 41C95C18 4332D94C 40A4173C 4332F324 40A360A8 40A33F48 414913BC 414670A8 41458428 41481A30 4162F150 4162F13C
Apr 25 22:20:39 10.9.0.2 2998: 003029: Apr 25 21:20:37.068 UTC: %RPC-2-RETRY: Recovered from RPC send failure for request c6k_sp_environmental:env_get_sensor_value_sp.  Resending request.
Apr 25 22:20:39 10.9.0.2 2999: -Traceback= 41F663FC 4227CC94 4227D580 4227E880 4227C29C 4162F150 4162F13C
Apr 25 22:21:51 10.9.0.2 3000: 003030: Apr 25 21:21:43.788 UTC: %RPC-SP-2-FAILED_USERHANDLE: Failed to send RPC request online_diag_sp_request:get_rp_cpu_info
Apr 25 22:22:57 10.9.0.2 3001: 003034: Apr 25 21:22:55.189 UTC: %RPC-2-FAILED_USERHANDLE: Failed to send RPC request sw_vlan_sp:sw_vlansp_get_4k_vlan_info
Apr 25 22:22:57 10.9.0.2 3002: -Traceback= 422CEDF4 41DB587C 41C93204 41C942B0 41C95C18 4332D94C 40A4173C 4332F324 40A360A8 40A33F48 414913BC 414670A8 41458428 41481A30 4162F150 4162F13C
Apr 25 22:23:01 10.9.0.2 3003: 003038: Apr 25 21:22:59.189 UTC: %RPC-2-FAILED_USERHANDLE: Failed to send RPC request sea_console_rp_cli_request:sea_console_send_rp_buffer_rpc
Apr 25 22:23:01 10.9.0.2 3004: -Traceback= 422EE128 428A9450 4162F150 4162F13C



Can anyone offer some guidance on what I can look at for troubleshooting this?


Regards,


Brandon

Correct Answer by Jerry Ye about 7 years 2 months ago

Just double checked with the show version, most likely you are hitting CSCtb27643. Sorry, time for an upgrade.


Regards,

jerry

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4.3 (3 ratings)
Loading.
spremkumar Tue, 04/27/2010 - 07:25
User Badges:
  • Red, 2250 points or more

Hi Brandon


I could see this reset related to memory issues, it clearly shows it was low on memory as a result it wasnt able to allocate memory.

The main reason might be dead memory or high usage.

whats the status now ? is it normal after the reset.


I would also suggest to raise TAC case on this so that Cisco reverts with any new bug with this particulat ios train.


Meanwhile can you also check with the known bugs with this IOS code?


regds

neovestit Tue, 04/27/2010 - 08:15
User Badges:

The device seems to be operating properly at this point.   I found bug CSCsy03587, but I don't think it applies to my situation.


Since the memory allocation always seems to be failing from the same address, you would tend to think the memory is bad - is that right?

Jerry Ye Tue, 04/27/2010 - 08:29
User Badges:
  • Cisco Employee,

SXI2 has a medium buffer memory leak CSCtb27643 which leads to the deferral of the SXI2 code. The fix is in SXI2a or above.


Also, if you can post the output of show version?


Regards,

jerry

neovestit Tue, 04/27/2010 - 09:46
User Badges:

Here's my show ver output:


6506E-1#sh ver
Cisco IOS Software, s72033_rp Software (s72033_rp-ADVENTERPRISEK9_WAN-M), Version 12.2(33)SXI2, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2009 by Cisco Systems, Inc.
Compiled Wed 15-Jul-09 00:40 by prod_rel_team


ROM: System Bootstrap, Version 12.2(17r)SX5, RELEASE SOFTWARE (fc1)


6506E-1 uptime is 1 day, 19 hours, 17 minutes
Uptime for this control processor is 1 day, 19 hours, 16 minutes
Time since 6506E-1 switched to active is 1 day, 19 hours, 15 minutes
System returned to ROM by s/w reset at 22:23:11 BST Sun Apr 25 2010 (SP by bus error at PC 0x4174A1F0, address 0x0)
System restarted at 22:26:46 BST Sun Apr 25 2010
System image file is "sup-bootdisk:s72033-adventerprisek9_wan-mz.122-33.SXI2.bin"



This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.


A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html


If you require further assistance please contact us by sending email to
[email protected].


cisco WS-C6506-E (R7000) processor (revision 1.1) with 983008K/65536K bytes of memory.
Processor board ID SAL1133XKNB
SR71000 CPU at 600Mhz, Implementation 0x504, Rev 1.2, 512KB L2 Cache
Last reset from s/w reset
6 Virtual Ethernet interfaces
92 Gigabit Ethernet interfaces
1917K bytes of non-volatile configuration memory.
8192K bytes of packet buffer memory.


65536K bytes of Flash internal SIMM (Sector size 512K).
Configuration register is 0x2102

Correct Answer
Jerry Ye Tue, 04/27/2010 - 11:44
User Badges:
  • Cisco Employee,

Just double checked with the show version, most likely you are hitting CSCtb27643. Sorry, time for an upgrade.


Regards,

jerry

antonkolev Mon, 04/20/2015 - 05:09
User Badges:

neovestit      - IS  software upgrade fix this bug ?

 

my 6509 ran like a champ for 5y without reboot and started appearing this errror message 

 %RPC-2-FAILED_USERHANDLE: Failed to send RPC request sw_vlan_sp:sw_vlansp_get_4k_vlan_info
-Traceback= 4121D010 41530AF4 42244E20 413F5468 413F5A24 4071837C 40717528 40E9AAB0 40E79A84 40E6D0B4 40E907A8 4102E528 4102E514

 

here is IOS image I am running - s72033-ipservicesk9-mz.122-18.SXF5.bin

 

 

anumishr Mon, 04/20/2015 - 10:11
User Badges:
  • Cisco Employee,

You're possibly seeing high cpu as well?

How frequent is this error?

Any recent configuration/HW changes?

antonkolev Tue, 04/21/2015 - 07:13
User Badges:

I have bandwidth graph already set for CPU and memory 

average CPU is 6-8 % - no CPU spikes

memory is normal 

 

Here are the errors

Apr 20 14:30:56: %RPC-2-FAILED_USERHANDLE: Failed to send RPC request sw_vlan_sp:sw_vlansp_get_4k_vlan_info
-Traceback= 4121D010 41530AF4 42244E20 413F5468 413F5A24 4071837C 40717528 40E9AAB0 40E79A84 40E6D0B4 40E907A8 4102E528 4102E514
Apr 20 15:01:10: %IPC-5-WATERMARK: 11015 messages pending in xmt for the port Primary RFS Server Port(10000.B) seat 10000

 

Apr 21 09:59:57: %IPC-4-GET_PAK_MSG: Failed for message size 1524
-Traceback= 4075244C 418EC9B0 418EC7CC 41AA9C34 41AA6B58 41AA6D74 41D48B5C
Apr 21 09:59:57: IPC: no IPC message for HDR: 8776AC8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 3, flags: 40 hi: 2E, lo: 23ED9088
Apr 21 09:59:57: IPC: no IPC message for HDR: 86D53C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 2, flags: 40 hi: 2F, lo: 23EDA4C8
Apr 21 09:59:57: IPC: no IPC message for HDR: 85D5BC8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 3, flags: 40 hi: 30, lo: 23E3D5C8
Apr 21 09:59:57: IPC: no IPC message for HDR: 872C4C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 2, flags: 40 hi: 31, lo: 23E3C188
Apr 21 09:59:57: IPC: no IPC message for HDR: 864E7C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 3, flags: 40 hi: 32, lo: 23E41288
Apr 21 09:59:57: IPC: no IPC message for HDR: 86596C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 2, flags: 40 hi: 33, lo: 23E3FE48
Apr 21 09:59:57: IPC: no IPC message for HDR: 86F05C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 3, flags: 40 hi: 34, lo: 23E44F48
Apr 21 09:59:57: IPC: no IPC message for HDR: 85DE7C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 2, flags: 40 hi: 35, lo: 23E43B08
Apr 21 09:59:57: IPC: no IPC message for HDR: 8653BC8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 3, flags: 40 hi: 36, lo: 23E48C08
Apr 21 09:59:57: IPC: no IPC message for HDR: 86033C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 2, flags: 40 hi: 37, lo: 23E4A048
Apr 21 09:59:57: IPC: no IPC message for HDR: 86010C8 src: 2040000, dst: 216000A, index: 3, seq: 0, sz: 56, type: 3, flags: 40 hi: 38, lo: 23E4C8C8
Apr 21 09:59:57:  Frames of RPC sw_vlan_sp process (pid 227) on 6 (proc|slot) after blocking rpc call failed: 402BEF98 404F7E24

Apr 21 09:59:57: %RPC-2-FAILED_USERHANDLE: Failed to send RPC request sw_vlan_sp:sw_vlansp_get_4k_vlan_info
-Traceback= 4121D010 41530AF4 42244E20 413F5468 413F5A24 4071837C 40717528 40E9AAB0 40E79A84 40E6D0B4 40E907A8 4102E528 4102E514
Apr 21 10:01:11: %IPC-5-WATERMARK: 11072 messages pending in xmt for the port Primary RFS Server Port(10000.B) seat 10000
           

anumishr Wed, 04/29/2015 - 10:43
User Badges:
  • Cisco Employee,

These are IPC messages stuck in buffer. Do you see any buffer leaks?

show buffer

show buffer summary

show buffer usage

show buffer detail

A reload might resolve this but to check further before reload, I would suggest opening a TAC case.

Thanks, Anupam

antonkolev Tue, 05/05/2015 - 07:36
User Badges:

these are what you can see from existing version Version 12.2(18)SXF5,

 

summay , usage and details , are not in this syntax 

 

6509A#sh buffers ?
  address          Buffer at a given address
  all              All buffers
  assigned         Buffers in use
  failures         Buffer allocation failures
  free             Buffers available for use
  input-interface  Buffers assigned to an input interface
  old              Buffers older than one minute
  pool             Buffers in a specified pool
  |                Output modifiers
  <cr>

 

 

 

6509A#sh buffers 
Buffer elements:
     2997 in free list (500 max allowed)
     569549872 hits, 0 misses, 2500 created

Public buffer pools:
Small buffers, 104 bytes (total 1024, permanent 1024, peak 1231 @ 7w0d):
     995 in free list (128 min, 2048 max allowed)
     3502484181 hits, 378 misses, 207 trims, 207 created
     42 failures (0 no memory)
Medium buffers, 256 bytes (total 3000, permanent 3000):
     2993 in free list (64 min, 3000 max allowed)
     28575212 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Middle buffers, 600 bytes (total 512, permanent 512):
     506 in free list (64 min, 1024 max allowed)
     182670765 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Big buffers, 1536 bytes (total 1000, permanent 1000):
     999 in free list (64 min, 1000 max allowed)
     4224332073 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
VeryBig buffers, 4520 bytes (total 10, permanent 10, peak 14 @ 7w0d):
     10 in free list (0 min, 100 max allowed)
     57089 hits, 52 misses, 5 trims, 5 created
     52 failures (0 no memory)
Large buffers, 9240 bytes (total 8, permanent 8, peak 10 @ 7w0d):
     8 in free list (0 min, 10 max allowed)
     68944123 hits, 10 misses, 3 trims, 3 created
     10 failures (0 no memory)
Huge buffers, 18024 bytes (total 2, permanent 2, peak 10 @ 7w0d):
     2 in free list (0 min, 4 max allowed)
     5789 hits, 64 misses, 124 trims, 124 created
     3 failures (0 no memory)

Interface buffer pools:
EOBC0/0 buffers, 1524 bytes (total 2400, permanent 2400):
     1056 in free list (0 min, 2400 max allowed)
     1344 hits, 0 fallbacks
     1200 max cache size, 760 in cache
     1808478848 hits in cache, 144 misses in cache
IPC buffers, 4096 bytes (total 12279, permanent 336, peak 12279 @ 00:00:36):
     117 in free list (112 min, 1120 max allowed)
     137672052 hits, 4279 fallbacks, 1011 trims, 12954 created
     0 failures (0 no memory)
Private Huge IPC buffers, 18024 bytes (total 2, permanent 2):
     2 in free list (1 min, 4 max allowed)
     0 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)
Private Huge buffers, 65280 bytes (total 2, permanent 2):
     2 in free list (1 min, 4 max allowed)
     9464879 hits, 0 misses, 0 trims, 0 created
     0 failures (0 no memory)

Header pools:

 

 

 

neovestit Tue, 04/27/2010 - 11:45
User Badges:

Thanks for your assistance guys. I'll see what I can do about upgrading.

anumishr Tue, 04/27/2010 - 11:42
User Badges:
  • Cisco Employee,

To find root cause, there should be crashinfo file in bootflash and sup_bootflash.

Can you take out it's output... that needs to be checked to find out exact cause.


Regards,

Anupam

Actions

This Discussion