GRS fail

Unanswered Question
Apr 1st, 2010

GRS fail

Hello everyone,

I have a problem that my active PRP was fail with leaving this log below

" SEC  8:.Mar  2 12:51:22: %MBUS-6-FAILEDPEER: Failed peer RP in slot 7 reason peer: pri heartbeat t/o "

My Question is that if I get this kind of logs how can I suspect whether it's hardware fail or software,

and next time, if I get the log again, what should I check?

Do you have any commands to investigate this fail?

If so, Pleas let me know

Detail information is like below :

SEC  8:50w5d: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:1y2w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:1y12w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:1y12w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:1y32w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:1y36w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:2y6w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:2y20w: Not all config may be removed and may reappear after reactivating the sub-interface
SEC  8:2y21w: Not all config may be removed and may reappear after reactivating the sub-interface

##
SEC  8:.Mar  2 12:51:22: %MBUS-6-FAILEDPEER: Failed peer RP in slot 7 reason peer: pri heartbeat t/o
SEC  8:.Mar  2 12:51:22: %RP-5-NEWPRIMARY: Switchover to new RP
.Mar  2 12:51:23: %FIB-4-FIBNULLIDB: Missing idb for fibidb ATM2/0.14 (if_number 63).
-Traceback= 1DB4E8 161E40 177338 178560 582EDC 583400 58419C 5842EC 584500 57F958 582BFC 2B14EC

.Mar  2 12:51:24: %MBUS-6-RP_STATUS: RP in Slot 8 Mode = MBUS Active
.Mar  2 12:51:30: %MBUS-6-FABCONFIG: Switch Cards 0x1F (bitmask) Primary Clock is CSC_1 Fabric Clock is Redundant
Bandwidth Mode : 10Gbps Bandwidth
##

Mar 22 10:35:49: %SONET-4-ALARM:  ATM4/2: ~SLOF ~SLOS ~LAIS  LRDI ~PAIS  PRDI ~PLOP 
Mar 22 10:35:59: %SONET-4-ALARM:  ATM4/2: ~SLOF ~SLOS ~LAIS  LRDI ~PAIS  PRDI ~PLOP 
Mar 22 10:35:59: %SONET-4-ALARM:  ATM4/2: ~SLOF ~SLOS ~LAIS ~LRDI ~PAIS ~PRDI ~PLOP

GSR_B#           sh redundancy
Redundant System Information :
------------------------------
       Available system uptime = 4 years, 42 weeks, 4 days, 23 hours, 28 minutes, 43 seconds
Switchovers system experienced = 3
              Standby failures = 0
        Last switchover reason = active unit failed

                 Hardware Mode = Duplex
    Configured Redundancy Mode = SSO (Stateful Switchover)
     Operating Redundancy Mode = SSO (Stateful Switchover)
              Maintenance Mode = Disabled
                Communications = Down      Reason: Simplex mode

Current Processor Information :
-------------------------------
               Active Location = slot 8
        Current Software state = ACTIVE
       Uptime in current state = 4 weeks, 2 days, 1 hour, 27 minutes, 51 seconds
                 Image Version = Cisco Internetwork Operating System Software
IOS (tm) GS Software (C12KPRP-P-M), Version 12.0(30)S2, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2005 by cisco Systems, Inc.
Compiled Thu 31-Mar-05 13:29 by pwade
                          BOOT = disk0:c12kprp-p-mz.120-30.S2.bin,1;
                   CONFIG_FILE =
                       BOOTLDR =
        Configuration register = 0x2102

Peer (slot: unavailable) information is not available because it is in 'DISABLED' state

GSR_B#  sh context
GSR_B#  sh context ?
  all      show all context info for all slots
  slot     specify a slot for which to show context information
  summary  display list of context information available
  |        Output modifiers
  <cr>

GSR_B#  sh context slot 7


GSR_B#sh gsr slot 7

SLOT STATE TRACE TABLE -- Slot 7  (Current Time is 151982313.864)
+-----------------------------------------------------------------------
|  Timestamp    Pid State    Event                                 Flags
+-----------------------------------------------------------------------
         0.944   3  ABSENT   EV_NULL
         2.284  34  ACTV RP  EV_RP_DEDUCE_PRIMARY
149376152.332  31   RP RDY  EV_RP_INSTANTIATE

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Thu, 04/01/2010 - 23:57

Hello Java,

>> %MBUS-6-FAILEDPEER: Failed peer RP in slot 7 reason peer: pri heartbeat  t/o

>> Peer (slot: unavailable) information is not available because it is in  'DISABLED' state

the PRP in slot 7 was disabled because it has stopped to communicate with the other PRP

pri heartbeat timeout.

it is not currently operational and it is not ready to takeover should the PRP in slot 8 fail.

see

http://www.cisco.com/univercd/cc/td/doc/product/core/cis12000/cis12410/icg/hfdm_c06.htm

for diagnostic tests

Hope to help

Giuseppe

antasson Fri, 04/02/2010 - 01:56

Hi Java,

well the reasons could go from a bad seat of the card, so a strong reseat might solve the issue, to a crash (is there any crashinfo file on the faulty card?) or an hardware failure of the card OR of the slot in the chassis (worst case).

Bottom line, first check if there's a crashinfo file, then reseat the card very smoothly but strongly.

Monitor it until the next occurrence. If it happens open a TAC case.

Regards,

Antonio

Actions

This Discussion