Fabric Interconnect B not respond in the network

angelommsousa · ‎07-24-2012

Hi All.

I have a very strange situation. Recently arrive the new UCS 2.0(3a) to our site.

After a week of running without problem the fabric interconnect B went down ( this happens twice ).

If I do a

porfic03-B# show cluster extended-state
Start time: Thu Jul 12 17:38:02 2012
Last election time: Thu Jul 12 18:29:59 2012

B: UP, PRIMARY
A: UP, SUBORDINATE

B: memb state UP, lead state PRIMARY, mgmt services state: UP
A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, DOWN
eth2, UP

HA READY

in the porfic03-A I get :

porfic03-A# show cluster extended-state
Start time: Thu Jul 12 18:29:51 2012
Last election time: Thu Jul 12 18:29:54 2012

A: UP, SUBORDINATE
B: UP, PRIMARY

A: memb state UP, lead state SUBORDINATE, mgmt services state: UP
B: memb state UP, lead state PRIMARY, mgmt services state: UP
heartbeat state PRIMARY_OK

INTERNAL NETWORK INTERFACES:
eth1, UP
eth2, UP

HA READY

So as you can see all looks fine inside UCS. Although outside UCS I cannot ping porfic03-B and the cluster virtual IP ( because is attached to porfic03-b that is the primary node )

Phisically I see that the management network card in porfic03-B as link but as no activity.

Can anyone point in the right direction to solve this issue ?.

Reboot the porfic03-B solve the problem but then the problem after a week comes back.

Any ideas ?

Regards

padramas · ‎07-24-2012

Hello,

Please provide following information from FI B

scope monitoring

scope sysdebug

show cores detail

connect nxos b

show version

show system reset-reason

show int mgmt0

------------------

Regarding network connectivity for FI B mgmt interface, start with verifying the cabling and upstream switch port configuration.

Padma

angelommsousa · ‎07-24-2012

I already check FI B mgmt interface cabling and upstream switch port ( no erros ) port is up in the switch. I already switch the cable in the mgmt A to the mgmt B ant the port still was no activity.

The output of:

porfic03-B# scope monitoring

porfic03-B /monitoring #

porfic03-B# scope sysdebug

^

% Invalid Command at '^' marker

porfic03-B# show cores detail

^

% Invalid Command at '^' marker

connect nxos b

show version:

Software
BIOS:      version 3.5.0
loader:    version N/A
kickstart: version 5.0(3)N2(2.03a)
system:    version 5.0(3)N2(2.03a)
power-seq: Module 1: version v1.0
             Module 3: version v2.0
uC:        version v1.2.0.1
SFP uC:    Module 1: v1.0.0.0
BIOS compile time:       02/03/2011
kickstart image file is: bootflash:/installables/switch/ucs-6100-k9-kickstart.
5.0.3.N2.2.03a.bin
kickstart compile time: 6/19/2012 7:00:00 [06/19/2012 15:21:08]
system image file is:    bootflash:/installables/switch/ucs-6100-k9-system.5.0
.3.N2.2.03a.bin
system compile time:     6/19/2012 7:00:00 [06/19/2012 17:04:19]

Hardware
cisco UCS 6248 Series Fabric Interconnect ("O2 32X10GE/Modular Universal Platf
orm Supervisor")
Intel(R) Xeon(R) CPU with 16622556 kB of memory.
Processor Board ID FOC161117SU

Device name: porfic03-B
bootflash: 31266648 kB

Kernel uptime is 11 day(s), 20 hour(s), 13 minute(s), 4 second(s)

Last reset
Reason: Unknown
System version: 5.0(3)N2(2.03a)
Service:

plugin
Core Plugin, Ethernet Plugin, Fc Plugin, Virtualization Plugin

show system reset-reason:

----- reset reason for Supervisor-module 1 (from Supervisor in slot 1) ---
1) No time
    Reason: Unknown
    Service:
    Version: 5.0(3)N2(2.03a)

2) At 462964 usecs after Wed Jul 4 16:04:11 2012
    Reason: Reset Requested by CLI command reload
    Service:
    Version: 5.0(3)N2(2.03a)

3) At 493083 usecs after Wed Jul 4 10:40:51 2012
    Reason: Reset Requested by CLI command reload
    Service:
    Version: 5.0(3)N2(2.03a)

4) At 902919 usecs after Tue Jul 3 15:29:48 2012
    Reason: Reset Requested by CLI command reload
    Service:
    Version: 5.0(3)N2(2.02q)

show int mgmt0:

mgmt0 is down (Administratively down)
Hardware: GigabitEthernet, address: 547f.ee8b.c060 (bia 547f.ee8b.c060)
Internet Address is xxx.xx.xx.xx/24
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
     reliability 64808/255, txload 1/255, rxload 1/255
Encapsulation ARPA
auto-duplex, 1000 Mb/s
EtherType is 0x0000
1 minute input rate 0 bits/sec, 0 packets/sec
1 minute output rate 0 bits/sec, 0 packets/sec
Rx
    472140 input packets 0 unicast packets 472140 multicast packets
    0 broadcast packets 39500853 bytes
Tx
    0 output packets 0 unicast packets 0 multicast packets
    0 broadcast packets 0 bytes

Thanks for your reply.

Regards

padramas · ‎07-24-2012

Hello,

Can you please check if there are any core dumps on the FI by

scope monitoring

scope sysdebug

show cores detail

Mgmt status being display down is a known issue and we cannot consider it in this scenario.

Is mac address of FI B mgmt B interface learned on upstream switch port ?

Padma

angelommsousa · ‎07-24-2012

Hi

No mac address in the upstream port.

porfic03-B /monitoring/sysdebug # scope monitoring

porfic03-B /monitoring # show cores detail

^

% Invalid Command at '^' marker

porfic03-B /monitoring # scope sysdebug

porfic03-B /monitoring/sysdebug # show cores detail

porfic03-B /monitoring/sysdebug # porfic03-B /monitoring/sysdebug # scope monitoring
porfic03-B /monitoring # show cores detail
^
% Invalid Command at '^' marker
porfic03-B /monitoring # scope sysdebug
porfic03-B /monitoring/sysdebug # show cores detail
porfic03-B /monitoring/sysdebug #

Thanks for the replay

Regards

angelommsousa · ‎07-25-2012

Hi all

When I do a:

- porfic03-B(nxos)# show hardware internal cpu-mac mgmt stats

I get a lot of errors in the mgmt port. I will switch the module and check in the next days if the problem was solved.

Regards