I've installed my first UCS system: 2 UCS 5108 & 2 UCS 6248
In first chassys six blade-servers (2 - b230 m2& 4 b200 m2). In second - 5 b200 m2. I've got two air conditioners in server room working on their maximum. For the last week i've received three faults on first chassis (Fault Code: F0411). IOM temperature was about 45-46. After that i've mooved 3 blade-servers to second chassis until i solve this problem.
UCSM version - 2.0.2r
Everything is quite good, except thermal problem. All blade-servers discovered, 0 errors and critical.
Please a open TAC service request with UCSM and Chassis 1 and 2 tech support bundle.
We need more logs to investigate the thermal fault.
Please reset the IOM physicaly present in that chasiss. I have done this twice for the thermal issue and the issue never re-accured.
I can't open TAC at this moment - my smartnet is still on registration... i've created technical files for Chassis 1 and 2. Should i place them here or wait for my smartnet?
This can be caused by an I2c issue on the server.
You can try the following:
Reset fans one by one.
Reset PSU one by one
Finally, reset IOMs starting for the faulty one.
Also, determine which blade is showing any alarms and try to reseat the blade on the chassis.
Please make sure to wait a couple of minutes during the resetting of the components.
Think this is a code bug and you need to goto 2.0(q). Running two 6248 systems at that level and not having the issue. This thermal stuff plagued ALL the 1.4x releases.
If this is a real I2C issue, you may still see the same behavior on 2.0.x release if the I2C bus was not cleared before the upgrade. (in this moment I don't know if you recently performed an upgrade on the system or not)
I2C bus tranports information about the different components of the Unified System, this, meaning Chassis, IOMs, Fans, PSU, etc... What happens is that all those components try to send theit status update while other do the same and then the I2C bus gets overwhelmed, and then noone can really report their real status, so we usually recommend the customer ro reseat all major components, one at the time, to clear the bus and then do the upgrade, if that is not done before the upgrade, it still should be done after.
Try reseating the Fans and PSU, one at the time, leaving a minute in between and then, IOMs one at the time, leaving three minutes in between and begining with the subordinate to cause minimun disruption.
If this does not clear the situation, then you will need to remove one of the components already mentioned, one at the time and do a "show tech-support chassis # all brief" to see what the I2C bus reports segment by segment (chassis, Fans, PSUs...) once you remove a component and the errors on each segment stop incrementing you will have your faulty piece of hardware, and a TAC case will be needed to send a replacement.
For further analysis or assistance, I strongly recommend a TAC case to be opened.
Actually you don't have to power off the 6248 FIs. A effective but luxury solution, is to decommission and powercycle the chassis that is generating those faults, including the power cords removal.Then, you can wait a minute and recommision the chassis. After that, all thermal fauls should go away.