chassis-seeprom errors

Unanswered Question
Jun 7th, 2012

We have two different UCS systems that show chassis-seeprom errors (one is a brand new production system). E.g.,

cnat-pod1-ucs6248-A# show cluster state Cluster Id: 0xde5abc12999911e1-0xb8c9547fee4ca524


A: UP, PRIMARY
B: UP, SUBORDINATE

HA READY

Detailed state of the device selected for HA storage:Chassis 1, serial: FOX1326G5KD, state: active with errors

Fabric A, chassis-seeprom local IO failure:

FOX1326G5KD READ_FAILED, error: TIMEOUT, error code: 10, error count: 37503
Fabric B, chassis-seeprom local IO failure:
FOX1326G5KD READ_FAILED, error: TIMEOUT, error code: 10, error count: 37504

Warning: there are pending SEEPROM errors on one or more devices, failover may not complete

UCSM seems blissfully unaware of the error. It does occasionally log events:

cnat-pod1-ucs6248-A# show event | include shared-storage
2012-05-12T15:28:14.547     78004 E4196535 device FOX1326G5KD, error accessing shared-storage
2012-05-12T15:28:14.546     78002 E4196535 device FOX1326G5KD, error accessing shared-storage
2012-05-11T05:43:14.544     77962 E4196535 device FOX1326G5KD, error accessing shared-storage
2012-05-10T16:43:14.544     76405 E4196535 device FOX1326G5KD, error accessing shared-storage
2012-05-09T20:43:14.544     74748 E4196535 device FOX1326G5KD, error accessing shared-storage
2012-05-09T13:28:14.544     67198 E4196535 device FOX1326G5KD, error accessing shared-storage

        or in the GUI -- seeprom-event.png

But at least in my case, the error counters keep going up every few seconds, but there hasn't been an event for 3 weeks. Cisco's message/fault guide indicates that this event is very serious error and that TAC should be contacted. If that's the case, how come these are logged as events and not faults? How come there is no call-home trigger for this condition?

-Craig

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 0 (0 ratings)
zvega Thu, 06/07/2012 - 17:58

There is a bug: CSCtu17144

The error accessing shared-storage is not harmful and does not affect the system operability

If the error counterts keep going up, try the following workaround:

1. Unplug the IO module.

2. Replug in the IO module. Make sure the module is in contact with the backplane firmly.

3. Reboot the IO module.

If after the workaround, the event keeps coming and  going, you can leave it alone, since it does not hurt the system, but if  it never clears it can be a chassis issue, so you may want to open a  TAC case to confirm that.

cweinhold Mon, 07/09/2012 - 11:58

Zaira,

Is your advice true if the error count is constantly increasing?  Below is a related TAC note that one of my colleagues received:

Q: Why the error accessing shared-storage fault can happen and why it is not harmful.

A: In UCS chassis design, we build in a chip called, SEEPROM, on the backplane. SEEPROM is a permanent memory and used to store SAM DB version to avoid the case of SAM DB being overwritten by old version when failover happens. The communication between IO module and SEEPROM through a wiring on backplane is not repliable. To overcome this difficulty, we store  the identical SAM DB version in three chassises rather than in one (so called three chassises redundancy). Because the communication between IO module and SEEPROM is not repliable, the error accessing shared-storage fault can happen sometimes - this is system behavior per specification and design. So, as long as one SEEPROM is readable, the UCS works normally. In your reported case, one chassis SEEPROM has read problem and the other two work. So, the error accessing shared-storage fault is not harmful and does not affect the system operability.

It seems that this isn't harmful if the error happens periodically. But what if the SEEPROM error count is increasing every minute? Doesn't that indicate that the SEEPRM can never be read? Wouldn't that be very serious if the UCS system had only one or two chassis?

zvega Mon, 07/09/2012 - 12:09

cweinhold,

If after the workaround the fault is still there and never cleared, a chassis replacement could be considered, but before that TAC has to check the logs to confirm that.

cweinhold Mon, 07/09/2012 - 12:31

Thanks for the reply.

One follow-up: how does UCSM handle SAM DB versioning and dual-active detection on a UCS system that has no chassis and is entirely used for C-Series integration? I.e., when there are no SEEPROM's.

Actions

Login or Register to take actions

This Discussion

Posted June 7, 2012 at 9:53 AM
Stats:
Replies:4 Avg. Rating:
Views:2693 Votes:0
Shares:0
Tags: No tags.
Categories: General UCS Hardware
+

Discussions Leaderboard