cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
17126
Views
10
Helpful
13
Replies

UCS C220 M3 Fan Speed Problem

Reuben Farrelly
Level 3
Level 3

I've recently purchased a Cisco UCS C220 M3 chassis bundle and at the moment am in a testing phase.

As soon as it was unboxed, I upgraded it to the latest HUU bundle - 1.5(4)3 and applied all the updates.  It wasn't far behind anyway - but I wanted to test with the latest builds to start with.  So much easier to upgrade now than later :-)

For a few days the box was quiet and the fans ran at a reasonably quiet speed.  The fans worked fine - but at a slow speed, and it was quite tolerable.  Even after installing XenServer and installing a guest VM it continued to run quietly for some time.

However in the past 24 hours the fan speeds have gone through the roof.  The box is literally screaming now.  What's odd is that there is almost no load on the system, and the environment is cool.  Multiple other devices in the rack are reporting normal temperatures of around 25-30 deg C.  The chassis itself is cool to touch.  Even the CIMC is reporting temperatures all in the 30's.

ucs-cimc /sensor # show temperature

Name                      Sensor Status  Reading    Units      Min. Warning Max. Warning Min. Failure Max. Failure

------------------------- -------------- ---------- ---------- ------------ ------------ ------------ ------------

P1_TEMP_SENS              Normal         34.0       C          N/A          74.0         N/A          79.0

P2_TEMP_SENS              Normal         34.0       C          N/A          74.0         N/A          79.0

RISER1_INLET_TMP          Normal         35.0       C          N/A          60.0         N/A          70.0

RISER2_INLET_TMP          Normal         33.0       C          N/A          60.0         N/A          70.0

RISER1_OUTLETTMP          Normal         38.0       C          N/A          60.0         N/A          70.0

RISER2_OUTLETTMP          Normal         33.0       C          N/A          60.0         N/A          70.0

FP_TEMP_SENSOR            Normal         30.0       C          N/A          60.0         N/A          70.0

DDR3_P1_A1_TEMP           Normal         33.0       C          N/A          65.0         N/A          85.0

DDR3_P2_E1_TEMP           Normal         32.0       C          N/A          65.0         N/A          85.0

PSU1_TEMP                 Normal         28.0       C          N/A          60.0         N/A          65.0

PSU2_TEMP                 Normal         30.0       C          N/A          60.0         N/A          65.0

PCH_TEMP_SENS             Normal         47.0       C          N/A          80.0         N/A          85.0

ucs-cimc /sensor #

ucs-cimc /sensor # show fan

Name                 Sensor Status        Reading    Units      Min. Warning    Max. Warning    Min. Failure    Max. Failure

-------------------- -------------------- ---------- ---------- --------------- --------------- --------------- ---------------

FAN1_TACH1           Normal               10272      RPM        1712            N/A             1284            N/A

FAN1_TACH2           Normal               9844       RPM        1712            N/A             1284            N/A

FAN2_TACH1           Normal               10272      RPM        1712            N/A             1284            N/A

FAN2_TACH2           Normal               9844       RPM        1712            N/A             1284            N/A

FAN3_TACH1           Normal               9844       RPM        1712            N/A             1284            N/A

FAN3_TACH2           Normal               9844       RPM        1712            N/A             1284            N/A

FAN4_TACH1           Normal               10272      RPM        1712            N/A             1284            N/A

FAN4_TACH2           Normal               9844       RPM        1712            N/A             1284            N/A

FAN5_TACH1           Normal               10272      RPM        1712            N/A             1284            N/A

FAN5_TACH2           Normal               9844       RPM        1712            N/A             1284            N/A

ucs-cimc /sensor #

ucs-cimc /chassis # show fan-policy

Fan Policy

---------------

low-power

ucs-cimc /chassis #

[NB: fan-policy was set to 'Balanced' but is now set to 'Lower-Power' - but changing this made no difference.]

I've seen a few other postings with similar prbolems to this going back some time, and they almost all referred to upgrading to the latest firmware which has all the fixes for this sort of behaviour.  But in my case I've already done that.  A full cold reboot also hasn't helped.

Has anyone got any other ideas on what could be wrong or what the cause could be?   It seems like a bug of some sort but...

Thanks,

Reuben

1 Accepted Solution

Accepted Solutions

Reuben,

I understand your point and your concern but to give you a better idea, fan speed can actually go up to 17000 RPMs, based on that, your server's fans are still in a mid range and to me it only looks like the heat is probably not being that effectively dissipated when the door is closed; that is the reason why I mentioned our R series racks that are tested with UCS and the door has actually a mesh that  assures the heat is properly moved out.

Rate ALL helpful answers.

-Kenny

View solution in original post

13 Replies 13

Reuben Farrelly
Level 3
Level 3

By pure chance I may have gotten almost to the bottom of this.  It seems the issue may have related to airflow or temperature around the very front of the chassis near the front panel (where the KVM connector is).  By having the rack door open, the server soon calmed right down within a couple of minutes and the fan speeds have dropped to 1/3 of what they were.  It has been steady this way for the past 2 hours now.

There must be something in the front panel of these units that is monitoring something that doesn't show up in the CIMC, as the temperature table above still has roughly the same values.  Maybe another temperature sensor in the front panel?

This:

http://www.techsupportforum.com/forums/f25/cisco-ucs-c220-m3-console-not-working-fans-being-weird-696413.html

...suggests likewise, that there's something in the front header near the KVM port, that is involved in fan speed regulation.

Having the (glass) door open isn't a good long term solution but it at least now gives me a pretty good idea how to work around the problem.  Maybe a mesh door may be a better long term fix.

The Tech Specs state that a gap of 76mm is required, I haven't measured yet but it must be reasonably close (there certainly is a gap).

But I'd be curious to know what exactly in the chassis is causing this behaviour :-)

Reuben,

We definitely have sensors all over the C220 that will regulate the temperature/fan speed.

The Fan Speed policy will be "ignored" if the server requires higher fan speed to cool down.  This is expected behavior as it is obvious that it is better to ignore the configured policy in case the server is getting hot. The server itself does not have to be boiling on the surface to increment the fan speed; as long as any of the sensors in the server detects a high temperature, the fan speed will be incremented until that sensors lowers the alarm/temperature.

In regards to the rack itself, have you seen our R-Series?

http://www.cisco.com/en/US/products/ps11518/index.html

http://www.cisco.com/en/US/docs/unified_computing/ucs/hw/rack_power/installation/guide/power.html

RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR   .png

Rate ALL helpful answers.

-Kenny

Hi Kenny

The question is though, none of the visible sensors actually appeared to be reporting a high temperature, so what exactly would have been triggering the fans to run at high speed?  Without me dismantling the hardware to find out I'm interested to know what input it was that was likely causing this to occur, because as far as I can tell based on the outputs above, it didn't appear that the server was actually hot.  There are no devices directly above or below the chassis and I thought there was plenty of airflow but..

Are there undocumented sensors in the chassis or more sensor inputs that just don't show up in the CIMC?

Obviously the more I can understand about how this works the easier it will be for me (and others) to avoid this in future.

Thanks,

Reuben

Reuben,

There are definitely more than just one sensor in the server but they all report to CIMC.

When you close the door, does it really go up again? I mean, it would be interesting to confirm this was not just a coincidence.

-Kenny

Yes it's entirely reproducible.  Close the front door and within 60 seconds the fans all start cranking up.  It's a glass door so it properly seals the noise and air around it  :-)

The same phenominon doesn't occur with the rear door though, only the front door.

I understand there's more than one sensor - the output from the show commands above shows quite a few.  But as I keep saying none of them seem to indicate anything is amiss, so if it wasn't one of those sensors that was showing a high temperature, what exactly was it that was signalling to the system to turn the fans up?

Reuben,

I understand your point and your concern but to give you a better idea, fan speed can actually go up to 17000 RPMs, based on that, your server's fans are still in a mid range and to me it only looks like the heat is probably not being that effectively dissipated when the door is closed; that is the reason why I mentioned our R series racks that are tested with UCS and the door has actually a mesh that  assures the heat is properly moved out.

Rate ALL helpful answers.

-Kenny

Ok, thanks for your help and explanations, Keny, I'll proceed down the wire mesh door option for now and remember this for future installations.

Great Reuben, it was nice talking to you.  Have a cool day! 

-Kenny

Hi all , i have one customer with this problem too since i install it ( 8 months ago):The fans make a big noise out of the ordinary, as if the server was always  booting. For about 15 seconds the noise increases and then slows, increases and slows down, always like a loop. I Made software upgrade to the latest version 2.0 (6) ucs-c220-hoo-2.0.6d.iso and the problem keeps. My question is whether this is a normal and known problem and can lead to more serious damage server.  The customer dont like this noiseThe temperature of the room is good and the temperature sensors and fans are all ok.

 

Best Regards

Open a TAC case

 

-Kenny
 

We have the same problem ... CPU fan speed is very very high and server is very very loud. 

Fan Sensors
Total 12
Sensor Name
Sensor Status
Speed (RPMS)
Warning Threshold Min
Warning Threshold Max
Critical Threshold Min
Critical Threshold Max
 
 
FAN1_TACH1
Normal
16000
1600
N/A
1200
N/A
FAN1_TACH2
Normal
18400
1600
N/A
1200
N/A
FAN2_TACH1
Normal
16000
1600
N/A
1200
N/A
FAN2_TACH2
Normal
18400
1600
N/A
1200
N/A
FAN3_TACH1
Normal
14100
1600
N/A
1200
N/A
FAN3_TACH2
Normal
17100
1600
N/A
1200
N/A
FAN4_TACH1
Normal
14100
1600
N/A
1200
N/A
FAN4_TACH2
Normal
16000
1600
N/A
1200
N/A
FAN5_TACH1
Normal
14100
1600
N/A
1200
N/A
FAN5_TACH2
Normal
16000
1600
N/A
1200
N/A
FAN6_TACH1
Normal
14100
1600
N/A
1200
N/A
FAN6_TACH2
Normal
16000
1600
N/A
1200
N/A

 

If you see, the fan speed is 18000+ and sometime 20000+ ... we have the latest upgrade 3.0(3f). 

 

Please, really need some help

Greetings.

Aside from a CIMC memory leak that predates the code version you are running, there are some common factors that are usually involved with the fans staying at high speeds:

  • Raid Controller is staying busy, and 'ROC' chip temp is 85 degrees or higher.  Check the 12Gb controller tab, under storage, and look for ROC chip temp sensor, if one is actually present.
  • Non-cisco PCI-E cards installed.  Normally most PCI-E adaptors on the UCS servers spec sheet have a thermal profile known to the CIMC, so the CIMC understands the cooling requirements of the card(s).  If the CIMC does not know what kind of card is installed, then sometime it can rev the fans up. Please confirm all the equipment installed is listed on the C220M4 spec sheet.
  • As Kenny previously mentioned, there are number of thermal sensors, including an inflow temperature sensor at the front (might call it an ambient temp sensor).  I have frequently seen cases where the RAID ROC chip sensor will go over 85celcius if the ambient temps get much above 73F, which in turns drives the FANs to the jet engine status ;)

This is something that would normally require reviewing a tech support from the server, and opening a TAC case.

Thanks

Kirk...

Thx for quick reply ... I will check those thermal profile ...

Review Cisco Networking products for a $25 gift card