Cisco 7204VXR with NPE-G1 high CPU and output drops

Unanswered Question

Hello community,


I Have a  Cisco 7204VXR router with NPE-G1 that  started acting weird for 24  hours. This router is dual attached to two  6k5 routers using multimode  FO and SX GBICs.


The router is used  as an LNS termination point for  PPPoVPDN sessions, we have a bunch of  them.

Here is a show ver  output :


LNS1.IX1#sh  version
Cisco IOS  Software, 7200 Software  (C7200-ADVENTERPRISEK9-M), Version 12.2(33)SRD,  RELEASE SOFTWARE (fc2)
Technical  Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2008 by Cisco Systems, Inc.
Compiled Thu  23-Oct-08  12:58 by prod_rel_team


ROM: System Bootstrap, Version  12.3(4r)T1,  RELEASE SOFTWARE (fc1)
BOOTLDR: 7200 Software  (C7200-KBOOT-M),  Version 12.3(5a), RELEASE SOFTWARE (fc1)

LNS1.IX1 uptime is 21  weeks, 23 hours, 22 minutes
System  returned to ROM by reload at  13:36:53 UTC Wed Nov 25 2009
System  restarted at 13:39:45 UTC Wed  Nov 25 2009
System image file is  "disk0:c7200-adventerprisek9-

mz.122-33.SRD.bin"
Last reload type: Normal  Reload
Last reload reason: Reload command



This  product contains cryptographic features and  is subject to United
States  and local country laws governing import,  export, transfer and
use. Delivery of Cisco cryptographic products  does not imply
third-party  authority to import, export, distribute  or use encryption.
Importers,  exporters, distributors and users are  responsible for
compliance  with U.S. and local country laws. By  using this product you
agree to comply with applicable laws and  regulations. If you are unable
to  comply with U.S. and local laws,  return this product immediately.


A  summary of U.S. laws governing Cisco  cryptographic products may be  found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html


If  you  require further assistance please contact us by sending email to
[email protected].

Cisco 7204VXR (NPE-G1) processor (revision B) with 983040K/65536K   bytes of memory.
Processor board ID 29498611
SB-1 CPU at 700Mhz,   Implementation 0x401, Rev 0.2, 512KB L2 Cache
4 slot VXR midplane,   Version 2.7

Last reset from power-on


PCI bus mb1 (Slots 1, 3 and 5) has a   capacity of 600 bandwidth points.
Current configuration on bus mb1   has a total of 0 bandwidth points.
This configuration is within the   PCI bus capacity and is supported.

PCI bus mb2 (Slots 2, 4 and  6) has a capacity of 600 bandwidth  points.
Current configuration on  bus mb2 has a total of 0 bandwidth  points.
This configuration is  within the PCI bus capacity and is  supported.

Please refer to  the following document "Cisco 7200 Series Port  Adaptor
Hardware  Configuration Guidelines" on Cisco.com <http://www.cisco.com>
for   c7200 bandwidth points oversubscription and usage guidelines.


1 FastEthernet  interface
3 Gigabit Ethernet interfaces
509K  bytes of NVRAM.


1000944K bytes  of ATA PCMCIA card at slot 0  (Sector size 512 bytes).
1000944K bytes  of ATA PCMCIA card at slot 1  (Sector size 512 bytes).
62592K bytes  of ATA PCMCIA card at slot 2 (Sector size 512 bytes).
16384K  bytes  of Flash internal SIMM (Sector size 256K).
Configuration  register is  0x2102




Starting yesterday afternoon, I saw appear  high CPU usage  and numerous output drops. My first instinct was that  GBICs started  dying so I replaced them on both siedes, no change.

Then, I  thought we were victim of a DDoS but my graphs show no  increase of  number of packets or things like that.


I have been  debugging this and found  out this :


LNS1.IX1#sh  interfaces gi0/2  controller
GigabitEthernet0/2 is up, line  protocol is up
  Hardware is BCM1250  Internal MAC, address is  000b.fcdd.c41a (bia 000b.fcdd.c41a)
   Description: F=B, E=BB2.IX1,  P=Gi9/1
  Internet address is 77.246.80.101/31
   MTU 9216  bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability  255/255,  txload 29/255, rxload 34/255
  Encapsulation 802.1Q Virtual  LAN,  Vlan ID  1., loopback not set
  Keepalive set (10 sec)
  Full   Duplex, 1000Mbps, link type is auto, media type is SX
   output  flow-control is XON, input flow-control is XON
  ARP type:  ARPA, ARP  Timeout 04:00:00
  Last input 00:00:00, output 00:00:00,  output  hang never
  Last clearing of "show interface" counters  00:22:35
    Input queue: 1/150/0/0 (size/max/drops/flushes); Total output drops:   2712
  Queueing strategy: Class-based queueing
   Output  queue: 311/1000/0 (size/max total/drops)
  5 minute input  rate  133959000 bits/sec, 28885 packets/sec
   5 minute output rate  115013000 bits/sec, 20137 packets/sec
      39283651 packets input,  1536817418 bytes, 0 no buffer
     Received  535 broadcasts (0 IP  multicasts)
     0 runts, 0 giants, 0 throttles
      164 input  errors, 0 CRC, 0 frame, 164 overrun, 0 ignored
      0 watchdog, 539  multicast, 0 pause input
     27456129 packets  output, 2625296538  bytes, 0 underruns
     0 output errors, 0  collisions, 0 interface  resets
     0 babbles, 0 late collision, 0  deferred
     0 lost  carrier, 0 no carrier, 0 pause output
      0 output buffer failures,  0 output buffers swapped out
Interface  GigabitEthernet0/2 (idb  0x50098218)
Hardware is BCM1250 Internal MAC  (Revision B2/B3)
   Network connection mode is AUTO
  network link  is up 
   Config  is 1000Mbps, Full Duplex
  Selected media-type is GBIC
   GBIC  type is 1000BaseSX
MAC Registers:
  mac_cfg =            0x000000C8000A0176, mac_thrsh_cfg = 0x0000080400084004
  mac_vlantag   =      0x0000000000000000, mac_frame_cfg = 0x241C400000280200
    mac_adfilter_cfg = 0x0000000000000E28, mac_enable    =   0x0000000000000C11
  mac_status =       0x0000000000040004,   mac_int_mask  = 0x00004F0000C300C3
  mac_txd_ctl =        0x000000000000000F, mac_eth_addr  = 0x00001AC4DDFC0B00
    mac_fifo_ptrs =    0x241C400000280200, mac_eopcnt    =   0x000044001B1B1B1B
  MAC RX is enabled  RX DMA - channel 0 is   enabled, channel 1 is disabled
  MAC TX is enabled  TX DMA - channel  0  is enabled, channel 1 is disabled
   Device status = 1000 Mbps,  Full-Duplex
PHY Registers:
  PHY is  Marvell 88E1011S (Rev 1.3)
   Control                =  0x1000           Status                 =  0x796D
  PHY ID  1               = 0x0141           PHY ID  2               = 0x0C62
   Auto Neg Advertisement = 0x01A0            Link Partner Ability   =  0x4120
  Auto Neg Expansion     =  0x0000           Next Page  Tx           = 0x2001
  Link Partner Next  Page = 0x0000            1000BaseT Control      = 0x0000
    1000BaseT Status       = 0x0000           Extended Status        =   0xC000
  PHY Specific Control   = 0x0008           PHY Specific   Status    = 0xAD04
  Interrupt Enable       = 0x6C00             Interrupt Status       = 0x0000
   Ext PHY Spec Control   =  0x0C64           Receive Error Counter  =  0x0000
  LED  Control            = 0x4100
  Ext PHY Spec Control 2 =   0x006A           Ext PHY Spec Status    = 0xA017
  PHY says Link is   UP, Speed 1000Mbps, Full-Duplex [AUTONEG Done]
   Physical Interface -  GBIC
  AUTONEG - Our ability is     1000M/FD  Pause Capable  (Asymmetric)
  AUTONEG - Partner ability is 1000M/FD
GBIC   registers:
  Register 0x00:   01  04  01  00  00  00  01  20
    Register 0x08:   40  0C  01  01  0D  00  00  00
   Register 0x10:    37  1E  00  00  4F  45  4D  20
  Register 0x18:    20  20  20  20   20  20  20  20
  Register 0x20:   20  20  20  20  00   00  00  00
   Register 0x28:   47  42  49  43  2D  53  58  20
   Register 0x30:    20  20  20  20  20  20  20  20
   Register 0x38:   00  00  00  00   03  52  00  BA
  Register 0x40:    00  1A  00  00  42  31  30  34
   Register 0x48:   38  31  39  31  20   20  20  20
  Register 0x50:    20  20  20  20  30  39  30  39
   Register 0x58:   32  39  20  20   68  B0  01  5A
   Register 0x60:   20  20  20  20  20  20  20  20
   Register 0x68:    20  20  20  20  20  20  20  20
  Register 0x70:    20  20  20  20  20   20  20  20
  Register 0x78:   20  20  20  20   20  20  20  20
   PartNumber: GBIC-SX        
   PartRev: B
   SerialNo: B1048191       
  Options:  0
    Length(9um/50um/62.5um): 000/550/300
  Date Code: 090929 
    Gigabit Ethernet Codes:  1
Internal Driver Information:
    lc_ip_turbo_fs = 0x6236BE18, ip_routecache = 0x11 (dfs = 0/mdfs = 0)
    rx cache size = 1000, rx cache end = 15
  max_mtu = 9244
Software  MAC address filter(hash:length/addr/mask/hits):
need_af_check =  0
  0x00:  0  ffff.ffff.ffff   0000.0000.0000         0
  0x2E:   0  0900.2b00.0005   0000.0000.0000         0
   0x2F:  0  0900.2b00.0004   0000.0000.0000         0
  0x5C:  0   0100.5e00.0002   0000.0000.0000         0
  0xC0:  0  0100.0ccc.cccc    0000.0000.0000         0
  0xD6:  0  0180.c200.0014    0000.0000.0000         0
   0xD7:  0  0180.c200.0015   0000.0000.0000         0
  0xE6:  0   000b.fcdd.c41a   0000.0000.0000         0
  ring sizes: RX = 128, TX =  256
   rx_particle_size: 512
Rx Channel 0:
  dma_config0 =       0x0010002000800888, dma_config1 =    0x002D000000600029
    dma_dscr_base =   0x000000000C218A40, dma_dscr_cnt =     0x0000000000000080
  dma_cur_dscr_a =  0x000010000C29FC82,   dma_cur_dscr_b = 0x02D4000000000001
  dma_cur_daddr  =    0x000080000C2190E0
  rxring = 0x0C218A40, shadow = 0x50098FA4, head  =  26 (0x0C218BE0)
   rx_overrun=78512, rx_nobuffer=0, rx_discard=0
   Error Interrupts:  rx_int_dscr = 0, rx_int_derr = 0, rx_int_drop = 53
Tx Channel 0:
   dma_config0 =     0x0000000001001088, dma_config1  =     0x00B6000000000010
   dma_dscr_base =   0x000000000C219280,  dma_dscr_cnt =    0x0000000000000000
  dma_cur_dscr_a =   0x00000F000C27F980,  dma_cur_dscr_b = 0x0000000000000000
   dma_cur_daddr  =   0x000000000C219430
  txring = 0x0C219280, shadow =  0x657D0A14, head =  164, tail = 165, tx_count = 1
   Error  Interrupts: tx_int_dscr = 0, tx_int_derr = 0, tx_int_dzero = 0
    chip_state = 2, ds->tx_limited = 0
  throttled = 0, enabled = 0,   disabled = 0
  reset=6(init=1, restart=5), auto_restart=1
    tx_underflow = 0, tx_overflow = 0
   rx_underflow = 0, rx_overflow =  0, filtered_pak=0
  descriptor  mismatch = 0, fixed alignment = 52530
   bad length = 0 dropped, 0  corrected
  unexpected sop = 0
Address Filter:
  Promiscuous  mode OFF
  Exact match table (for  unicast, maximum 8 entries):
     Entry 0 MAC Addr = 000b.fcdd.c41a
     (All other entries are  empty)
  Hash match table (for multicast,  maximum 8 entries):
     Entry 0 MAC Addr = 0100.0ccc.cccc
     Entry 1 MAC Addr =  0900.2b00.0004
    Entry 2 MAC Addr =  0900.2b00.0005
     Entry 3 MAC Addr = 0180.c200.0014
    Entry 4  MAC Addr =  0180.c200.0015
    Entry 5 MAC Addr = 0100.5e00.0002
     (All  other entries are empty)
Statistics:
  Rx  Bytes                  19767582054   Tx Bytes                  19944011802
   Rx Good Packets             27498925   Tx Good  Packets              27480126
  Rx Multicast                     544
   Rx  Broadcast                       0


  Rx Bad Pkt  Errors                  0   Tx  Bad Pkt Errors                  0
   Rx FCS  Errors                      0   Tx FCS  Errors                      0
   Rx Runt Errors                      0   Tx Runt  Errors                     0
  Rx Oversize  Errors                  0   Tx Oversize Errors                 0
   Rx Length  Errors                   0   Tx  Collisions                      0
   Rx Code  Errors                     0   Tx Late  Collisions                 0
   Rx Dribble Errors                   0   Tx Excessive  Collisions            0
                                           Tx Abort  Errors                    0



My queuing strategy  went from  FIFO to Class-based queuing !?! How is that possible ?


Some  might this could be related to CPU but it's not :

LNS1.IX1#sh processes cpu history
                            
                                
     2222222223333322222444444444455555333332222222222222222222
      6666666666666699999444449999911111666666666688888888888888
100                                                            
90                                                           
80                                                           
   70                                                          
60                                                           
50                         **********                       
40           *****     ********************                  
  30  **********************************************************
20   **********************************************************
10   **********************************************************
     0....5....1....1....2....2....3....3....4....4....5....5....
               0    5    0    5    0    5    0    5    0    5   
                 CPU% per second (last 60 seconds)
                                                               
    4434344555655555455455555454555555555555555555555555565555
      8253536886175400933900221909246456366223331226663324503316
100                                                            
90                                                           
80                                                           
   70                                                          
60         ******                 * ** **        ***    **   *
50  *      **###########*####################################*#
40   *******#################################################*#
  30  ##########################################################
20   ##########################################################
10   ##########################################################
     0....5....1....1....2....2....3....3....4....4....5....5....
               0    5    0    5    0    5    0    5    0    5   
                 CPU% per minute (last 60 minutes)
              * = maximum CPU%   #   = average CPU%
                                                                           
      6567544433244568899987786666543422242347557976575555443422242335455655
      1917692522895458476603000342305575449131462755826358236305308599538984
100                    ***                       *                         
   90                * ***                         *                         
80    *           ******    *                   **                        
70    *            ***###****               *  **** *                   * 
  60  *****         *#######******           * ******** **           *   ***
50 ####** *   ***##############*  *        *************            *******
40 ######**    **################***   *   *######***********   * ***####**
  30  ########****#################***** ***##############**** *  ***########
20  #################################*######################***###########
10   ######################################################################
      0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
               0    5    0    5    0    5    0    5    0    5    0    5    0
                     CPU% per hour (last 72 hours)
                  * = maximum CPU%    #  = average CPU%


LNS1.IX1#sh processes cpu sorted | e 0.00
CPU utilization  for  five seconds: 32%/28%; one minute: 37%; five minutes: 36%
PID   Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
167    148648548 669722691        221  1.11%  0.94%  0.71%   0 LFDp  Input  Proc 
   92    84501540 658096034        128  0.55%  0.48%   0.38%   0 IP  Input        
158    29882772  19491105       1533   0.47%  0.50%   0.50%   0 CEF: IPv4 proces
  44    21898608    2506816       8735   0.31%  0.16%  0.17%   0 Net Background  
   163      1811203169702554          0  0.23%  0.15%  0.16%   0 HQF Input   Shaper
230    13727516   2560015       5362  0.23%  0.19%   0.18%    0 Compute load avg
238     4050992  17914481        226   0.15%   0.04%  0.05%   0 L2TP mgmt daemon
  237     3963624  303695077         13  0.15%  0.14%  0.15%   0 L2X Data  Daemon 
260     1429340 404591201          3  0.15%  0.08%  0.08%    0 PPP  Events      
162      2102403169697698          0  0.15%   0.28%   0.27%   0 HQF Shaper Backg
  234       10364      4211       2461   0.07%  0.08%  0.03%   2 Virtual  Exec    
  84       61120  398983803          0  0.07%  0.03%   0.02%   0 ACCT Periodic Pr
254      126412 399597741          0   0.07%  0.06%  0.07%   0 PPP  manager



I have been digging around in the archives, looks like I'm  victim of  microburst trafic :


http://puck.nether.net/pipermail/cisco-nsp/2009-April/060158.html



Any  ideas on what might be  causing this ?

Thanks.


Regards.


Y.
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Mon, 04/26/2010 - 04:52
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Youssef,

answered in WAN forum


let's go on there


Best Regards

Giuseppe

Actions

This Discussion