cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
6596
Views
5
Helpful
9
Replies

UCS Chassis Fans

Steven Williams
Level 4
Level 4

All of my fans in my UCS chassis are being flagged as inoperable?

The fans are screaming in the datacenter. Very loud. Room temp is fine.

                   Lots of errors like this?

<faultInst

ack="yes"

cause="equipment-inoperable"

changeSet=""

code="F0373"

created="2014-03-06T21:46:44"

descr="Fan 1 in Fan Module 1-1 under chassis 1 operability: inoperable"

dn="sys/chassis-1/fan-module-1-1/fan-1/fault-F0373"

highestSeverity="major"

id="520430"

lastTransition="2014-03-06T21:46:44"

lc=""

occur="1"

origSeverity="major"

prevSeverity="major"

rule="equipment-fan-inoperable"

severity="major"

status="created"

tags="network,server"

type="equipment">

</faultInst>

1 Accepted Solution

Accepted Solutions

Steven,

See this:

There is primarily a communication error (not a hardware failure,at least at first glance) with IOM attached to FI-B. Look at the difference between both of the IOMs, where the IOM to FI-A is affected but not that much (I made it a little more understandable):

DC-UCS-FI-A# connect iom 1

fex-1# show platform soft cmc showi2c

segment 1 chassis

        norxack 5514   <<<<< means a problem if you have this much

        unfinished 1

        lostarbitration 1

        fixup 2

segment 2 blade

segment 3 fan

segment 4 psu

        pca9541postio2 12

gilroy.error.pca9541_control_state 77

gilroy.error.do_reserve_pca9541_control 77

bus_lost_counter: 991

error_pca9541_per_device:

                c.ms 77   <<<< This is a problem with the chassis management selector which is located on both IOMs for mgnt

                c.gpio0 1

# I2C Device Statistics

c.seeprom={SUCCESS=262839}

f.fm0.fru={SUCCESS=1}   <<<<<< All fans are talking fine with the IOM that is supposed to manage their behavior

f.fm1.fru={SUCCESS=1}

f.fm2.fru={SUCCESS=1}

f.fm3.fru={SUCCESS=1}

f.fm4.fru={SUCCESS=1}

f.fm5.fru={SUCCESS=1}

f.fm6.fru={SUCCESS=1}

f.fm7.fru={SUCCESS=1}

--------------------------------------------------------------------------------------------------

DC-UCS-FI-B# connect iom 1

fex-1#  show platform soft cmc showi2c

segment 1 chassis

        norxack 2795   <<<<<<<

        wait_gt_deadline 207743

segment 2 blade

segment 3 fan

        norxack 108550    <<<<<< Is evident the problem when talking to the other IOM

        timeout 27815   <<<<<<

        unfinished 14   

        fixup 55614

        pca9541clrerrprs 7

        pca9541seterr 2

        pca9541postio3 2

        wait_gt_deadline 9319746

        hub_sw_mbb 27785

        hub_sw_mbb_to 27785

segment 4 psu

        pca9541postio2 12

        wait_gt_deadline 6664338

gilroy.error.pca9541_control_state 99

gilroy.error.do_reserve_pca9541_control 99

bus_lost_counter: 535

error_pca9541_per_device:

                c.ms 99    <<<<< c.ms again

                f.fm0.fc 3  << and the fans showing up here

                f.fm1.ms 1

                f.fm2.ms 1

                f.fm3.fc 2

                f.fm3.ms 1

                f.fm4.ms 1

                f.fm5.ms 1

                f.fm6.ms 1

                f.fm7.ms 1

# I2C Device Statistics

c.seeprom={SUCCESS=263768}

f.fm0.fc={SUCCESS=4410402,EIO=2,EBUSY=3477}  << this means an efforts to communicate but finds the channel busy

f.fm1.fc={SUCCESS=4407270,ENXIO=2,EBUSY=3478}

f.fm2.fc={SUCCESS=4406286,EBUSY=3478}

f.fm3.fc={SUCCESS=4406492,ENXIO=2,EBUSY=3476}

f.fm4.fc={SUCCESS=4406621,EBUSY=3476}

f.fm5.fc={SUCCESS=4407430,EIO=1,ENXIO=1,EBUSY=3476}

f.fm6.fc={SUCCESS=4407772,EBUSY=3476}

f.fm7.fc={SUCCESS=4407739,ENXIO=1,EBUSY=3480}

I would recommend to reseat the IOM connected to FI-B during a maintenance (take out for about 5 minutes to drain all the power), same as the fans (take the fans out for about 3 minutes, same reason as the IOM) and be sure you are running the follwoing versions or above, according to the firmware you are running: 2.0.5b or later  OR  2.1.1f or later OR any version of 2.2.   IF you are already running one of those versions, then just do the reseat.

I hope this helps.

Rate ALL helpful answers.

-Kenny

View solution in original post

9 Replies 9

Keny Perez
Level 8
Level 8

Hello Steven,

Are these messages transient? I mean, are they also cleared within minutes or the issue keeps on showing in UCSM and nothing clears the iisue?

Could you please run the following command and paste the output?

Connect local a

connect iom X   <<< X is the chassis number with the issue

show platform soft cmc showi2c \no-more

exit

Connect local b

connect iom X   <<< X is the chassis number with the issue

show platform soft cmc showi2c |no-more

Regards,

-Kenny

DC-UCS-FI-A# connect iom 1
Attaching to FEX 1 ...
To exit type 'exit', to abort type '$.'

fex-1#
fex-1# show platform soft cmc showi2c
# I2C Bus Statistics Fri Mar  7 10:38:23 CST 2014
# I2C Bus 1
busn=0 nseg=2
segment 0 local
segment 1 extended
        wait_gt_deadline 1102083
bus_lost_counter: 991
error_pca9541_per_device:
# I2C Bus 2
busn=1 nseg=5
segment 0 local
segment 1 chassis
        norxack 5514
        unfinished 1
        lostarbitration 1
        fixup 2
        pca9541seterr 1
        wait_gt_deadline 437329
segment 2 blade
segment 3 fan
segment 4 psu
        pca9541postio2 12
gilroy.error.pca9541_control_state 77
gilroy.error.do_reserve_pca9541_control 77
gilroy.counter.reserve 74
gilroy.counter.release 3
gilroy.counter.reserved 9277908
gilroy.counter.already_reserved 1
gilroy.counter.released 9277136
gilroy.counter.already_released 770
gilroy.counter.status_chassis_off 16828076
bus_lost_counter: 991
error_pca9541_per_device:
                c.ms 77
                c.gpio0 1

# I2C Device Statistics
iom.fru={SUCCESS=12}
iom.rtc={SUCCESS=7}
NAME=iom.woodside
No such path /sys/devices/platform/fsl-i2c.1/i2c-0/0-0048
NAME=iom.dcdc0
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
NAME=iom.dcdc1
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
iom.gpio0={SUCCESS=984572}
iom.gpio1={SUCCESS=984572}
iom.gpio2={SUCCESS=984572}
iom.gpio3={SUCCESS=1969208}
iom.temp.inlet1={SUCCESS=3937667}
iom.temp.inlet2={SUCCESS=3937667}
iom.temp.woodside={SUCCESS=3445459}
c.fru={SUCCESS=2}
c.seeprom={SUCCESS=262839}
f.fm0.fru={SUCCESS=1}
f.fm1.fru={SUCCESS=1}
f.fm2.fru={SUCCESS=1}
f.fm3.fru={SUCCESS=1}
f.fm4.fru={SUCCESS=1}
f.fm5.fru={SUCCESS=1}
f.fm6.fru={SUCCESS=1}
f.fm7.fru={SUCCESS=1}
p.psu0.fru={SUCCESS=1}
p.psu1.fru={SUCCESS=1}
p.psu2.fru={SUCCESS=1}
p.psu3.fru={SUCCESS=1}
c.gpio0={EIO=1,ENXIO=7}
c.gpio1={ENXIO=8}
c.gpio2={ENXIO=8}
c.gpio3={ENXIO=8}
# I2C Driver sysctl entries
sysctl: error: permission denied on key 'net.ipv4.route.flush'
sysctl: error: permission denied on key 'kernel.cad_pid'
sysctl: error: permission denied on key 'kernel.cap-bound'
dev.i2c.disconnect_retry = 3
dev.i2c.post_trigger = 64
dev.i2c.norxack_blink = 5
dev.i2c.norxack_blink = 5
dev.i2c.fixup_blink = 0
dev.i2c.pca9541-workaround = 16
dev.i2c.wait_deadline = 30
dev.i2c.chassis_reservation.demand = 0
dev.i2c.chassis_reservation.lock_state = 1
dev.i2c.chassis_reservation.auto_release = 1
dev.i2c.chassis_reservation.on_demand = 0
dev.i2c.chassis_reservation.pause_gilroy_thread = 0
dev.i2c.chassis_reservation.min_notheld_ms = 150
dev.i2c.chassis_reservation.wait_extra_ms = 2000
dev.i2c.chassis_reservation.grace_ms = 500
dev.i2c.chassis_reservation.hold_ms = 1000
dev.i2c.chassis_reservation.wait_ms = 1500
dev.i2c.gilroy-debug-level = 3
dev.i2c.debug-level = 1
dev.i2c.pca9541-businit = 1
dev.i2c.pca9541-delay = 250
dev.i2c.bus2.write-cdelay = 100
dev.i2c.bus2.write-delay = 100
dev.i2c.bus1.write-cdelay = 30
dev.i2c.bus1.write-delay = 30
fex-1#

--------------------------------------------------------------------------------------------------

DC-UCS-FI-B# connect iom 1
Attaching to FEX 1 ...
To exit type 'exit', to abort type '$.'

fex-1#  show platform soft cmc showi2c
# I2C Bus Statistics Fri Mar  7 10:39:32 CST 2014
# I2C Bus 1
busn=0 nseg=2
segment 0 local
        wait_gt_deadline 4
segment 1 extended
        wait_gt_deadline 670925
bus_lost_counter: 535
error_pca9541_per_device:
# I2C Bus 2
busn=1 nseg=5
segment 0 local
segment 1 chassis
        norxack 2795
        wait_gt_deadline 207743
segment 2 blade
segment 3 fan
        norxack 108550
        timeout 27815
        unfinished 14
        fixup 55614
        pca9541clrerrprs 7
        pca9541seterr 2
        pca9541postio3 2
        wait_gt_deadline 9319746
        hub_sw_mbb 27785
        hub_sw_mbb_to 27785
segment 4 psu
        pca9541postio2 12
        wait_gt_deadline 6664338
gilroy.error.pca9541_control_state 99
gilroy.error.do_reserve_pca9541_control 99
gilroy.counter.reserve 50
gilroy.counter.release 49
gilroy.counter.reserved 9277823
gilroy.counter.already_reserved 2
gilroy.counter.released 9259317
gilroy.counter.already_released 18459
gilroy.counter.status_chassis_off 12692228
bus_lost_counter: 535
error_pca9541_per_device:
                c.ms 99
                f.fm0.fc 3
                f.fm1.ms 1
                f.fm2.ms 1
                f.fm3.fc 2
                f.fm3.ms 1
                f.fm4.ms 1
                f.fm5.ms 1
                f.fm6.ms 1
                f.fm7.ms 1

# I2C Device Statistics
iom.fru={SUCCESS=12}
iom.rtc={SUCCESS=7}
NAME=iom.woodside
No such path /sys/devices/platform/fsl-i2c.1/i2c-0/0-0048
NAME=iom.dcdc0
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
NAME=iom.dcdc1
        detach_state driver iout iout_cal_offset iout_oc_fault_limit mfr_date mfr_id mfr_location mfr_model mfr_revision mfr_serial name ot_fault_limit ot_warn_limit temperature1 temperature2 vin vin_ov_fault_limit vin_uv_fault_limit vout vout_mode vout_n vout_ov_fault_limit vout_uv_fault_limit
iom.gpio0={SUCCESS=922122}
iom.gpio1={SUCCESS=922137}
iom.gpio2={SUCCESS=922122}
iom.gpio3={SUCCESS=1844308}
iom.temp.inlet1={SUCCESS=3687803}
iom.temp.inlet2={SUCCESS=3687803}
iom.temp.woodside={SUCCESS=3226828}
c.fru={SUCCESS=2}
c.seeprom={SUCCESS=263768}
f.fm0.fc={SUCCESS=4410402,EIO=2,EBUSY=3477}
f.fm0.fru={SUCCESS=1}
f.fm1.fc={SUCCESS=4407270,ENXIO=2,EBUSY=3478}
f.fm1.fru={SUCCESS=1}
f.fm2.fc={SUCCESS=4406286,EBUSY=3478}
f.fm2.fru={SUCCESS=1}
f.fm3.fc={SUCCESS=4406492,ENXIO=2,EBUSY=3476}
f.fm3.fru={SUCCESS=1}
f.fm4.fc={SUCCESS=4406621,EBUSY=3476}
f.fm4.fru={SUCCESS=1}
f.fm5.fc={SUCCESS=4407430,EIO=1,ENXIO=1,EBUSY=3476}
f.fm5.fru={SUCCESS=1}
f.fm6.fc={SUCCESS=4407772,EBUSY=3476}
f.fm6.fru={SUCCESS=1}
f.fm7.fc={SUCCESS=4407739,ENXIO=1,EBUSY=3480}
f.fm7.fru={SUCCESS=1}
p.psu3.psmi={SUCCESS=12974716}
p.psu0.fru={SUCCESS=1}
p.psu0.psmi={SUCCESS=13299601}
p.psu1.fru={SUCCESS=1}
p.psu1.psmi={SUCCESS=12974803}
p.psu2.fru={SUCCESS=1}
p.psu2.psmi={SUCCESS=12974774}
p.psu3.fru={SUCCESS=1}
c.gpio0={ENXIO=8}
c.gpio1={ENXIO=8}
c.gpio2={ENXIO=8}
c.gpio3={ENXIO=8}
# I2C Driver sysctl entries
sysctl: error: permission denied on key 'net.ipv4.route.flush'
sysctl: error: permission denied on key 'kernel.cad_pid'
sysctl: error: permission denied on key 'kernel.cap-bound'
dev.i2c.disconnect_retry = 3
dev.i2c.post_trigger = 64
dev.i2c.norxack_blink = 5
dev.i2c.norxack_blink = 5
dev.i2c.fixup_blink = 0
dev.i2c.pca9541-workaround = 16
dev.i2c.wait_deadline = 30
dev.i2c.chassis_reservation.demand = 0
dev.i2c.chassis_reservation.lock_state = 1
dev.i2c.chassis_reservation.auto_release = 1
dev.i2c.chassis_reservation.on_demand = 0
dev.i2c.chassis_reservation.pause_gilroy_thread = 0
dev.i2c.chassis_reservation.min_notheld_ms = 150
dev.i2c.chassis_reservation.wait_extra_ms = 2000
dev.i2c.chassis_reservation.grace_ms = 500
dev.i2c.chassis_reservation.hold_ms = 1000
dev.i2c.chassis_reservation.wait_ms = 1500
dev.i2c.gilroy-debug-level = 3
dev.i2c.debug-level = 1
dev.i2c.pca9541-businit = 1
dev.i2c.pca9541-delay = 250
dev.i2c.bus2.write-cdelay = 100
dev.i2c.bus2.write-delay = 100
dev.i2c.bus1.write-cdelay = 30
dev.i2c.bus1.write-delay = 30
fex-1#

Steven,

See this:

There is primarily a communication error (not a hardware failure,at least at first glance) with IOM attached to FI-B. Look at the difference between both of the IOMs, where the IOM to FI-A is affected but not that much (I made it a little more understandable):

DC-UCS-FI-A# connect iom 1

fex-1# show platform soft cmc showi2c

segment 1 chassis

        norxack 5514   <<<<< means a problem if you have this much

        unfinished 1

        lostarbitration 1

        fixup 2

segment 2 blade

segment 3 fan

segment 4 psu

        pca9541postio2 12

gilroy.error.pca9541_control_state 77

gilroy.error.do_reserve_pca9541_control 77

bus_lost_counter: 991

error_pca9541_per_device:

                c.ms 77   <<<< This is a problem with the chassis management selector which is located on both IOMs for mgnt

                c.gpio0 1

# I2C Device Statistics

c.seeprom={SUCCESS=262839}

f.fm0.fru={SUCCESS=1}   <<<<<< All fans are talking fine with the IOM that is supposed to manage their behavior

f.fm1.fru={SUCCESS=1}

f.fm2.fru={SUCCESS=1}

f.fm3.fru={SUCCESS=1}

f.fm4.fru={SUCCESS=1}

f.fm5.fru={SUCCESS=1}

f.fm6.fru={SUCCESS=1}

f.fm7.fru={SUCCESS=1}

--------------------------------------------------------------------------------------------------

DC-UCS-FI-B# connect iom 1

fex-1#  show platform soft cmc showi2c

segment 1 chassis

        norxack 2795   <<<<<<<

        wait_gt_deadline 207743

segment 2 blade

segment 3 fan

        norxack 108550    <<<<<< Is evident the problem when talking to the other IOM

        timeout 27815   <<<<<<

        unfinished 14   

        fixup 55614

        pca9541clrerrprs 7

        pca9541seterr 2

        pca9541postio3 2

        wait_gt_deadline 9319746

        hub_sw_mbb 27785

        hub_sw_mbb_to 27785

segment 4 psu

        pca9541postio2 12

        wait_gt_deadline 6664338

gilroy.error.pca9541_control_state 99

gilroy.error.do_reserve_pca9541_control 99

bus_lost_counter: 535

error_pca9541_per_device:

                c.ms 99    <<<<< c.ms again

                f.fm0.fc 3  << and the fans showing up here

                f.fm1.ms 1

                f.fm2.ms 1

                f.fm3.fc 2

                f.fm3.ms 1

                f.fm4.ms 1

                f.fm5.ms 1

                f.fm6.ms 1

                f.fm7.ms 1

# I2C Device Statistics

c.seeprom={SUCCESS=263768}

f.fm0.fc={SUCCESS=4410402,EIO=2,EBUSY=3477}  << this means an efforts to communicate but finds the channel busy

f.fm1.fc={SUCCESS=4407270,ENXIO=2,EBUSY=3478}

f.fm2.fc={SUCCESS=4406286,EBUSY=3478}

f.fm3.fc={SUCCESS=4406492,ENXIO=2,EBUSY=3476}

f.fm4.fc={SUCCESS=4406621,EBUSY=3476}

f.fm5.fc={SUCCESS=4407430,EIO=1,ENXIO=1,EBUSY=3476}

f.fm6.fc={SUCCESS=4407772,EBUSY=3476}

f.fm7.fc={SUCCESS=4407739,ENXIO=1,EBUSY=3480}

I would recommend to reseat the IOM connected to FI-B during a maintenance (take out for about 5 minutes to drain all the power), same as the fans (take the fans out for about 3 minutes, same reason as the IOM) and be sure you are running the follwoing versions or above, according to the firmware you are running: 2.0.5b or later  OR  2.1.1f or later OR any version of 2.2.   IF you are already running one of those versions, then just do the reseat.

I hope this helps.

Rate ALL helpful answers.

-Kenny

Running 2.1(3a)

We dont have a maintenance window per say. So can this be done while production is running? Removing one IOM should be ok since the traffic will take the path of the other IOM correct?

If you are sure it is properly configure in the Service Profile then that is true...for ethernet traffic.... for FC traffic you need to configure multipath at the OS level...

Or if you are using vCenter, Nexus1000v can also take care of it for you.

Rate ALL helpful answers.

-Kenny

Most blades are using ESXi so multipath is enabled and working. The other blades are using Linux on baremetal and fabric failover is configured so I assume I am good.

Cool, if this solves the issue don't forget to mark the question as answered. In the other hand, if the issue is not solved, open a TAC as you might need to have someone checked the hardware as there are some hardware fixes that can help solve the situation and further investigate if there is anything else.

Rate ALL helpful answers.

-Kenny

starting with the least risky process, I reseated every fan within the chassis and all events have been cleared.

Thanks Keny!

Glad I helped

Review Cisco Networking products for a $25 gift card