cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
8433
Views
5
Helpful
10
Replies

Fabric Interconnect B, management services are unresponsive

Amit Vyas
Level 1
Level 1

Hi,

We have configured Call Home option in UCSM and we are getting below error from Call Home option since last Saturday. We have open TAC with Cisco to troubleshoot this error but as per TAC "The error is a transient error from which the fabric interconnects can automatically recover."

Below is the error messages we are getting

E-mail-1:

Subject:

System Notification from System-A - diagnostic:GOLD-major - 2011-12-27 17:54:09 GMT-00:00 Fabric Interconnect B, management services are unresponsive

Body Message:

System Name:System-A

Time of Event:2011-12-27 17:54:09 GMT-00:00

Event Description:Fabric Interconnect B, management services are unresponsive

Severity Level:6

E-mail-2:

Subject:

System Notification from System-A - diagnostic:GOLD-major - 2011-12-27 17:54:09 GMT-00:00 Fabric Interconnect B, management services are unresponsive

Body Message:

<?xml version="1.0" encoding="UTF-8" ?>

<soap-env:Envelope xmlns:soap-env="http://www.w3.org/2003/05/soap-envelope">

<soap-env:Header>

<aml-session:Session xmlns:aml-session="http://www.cisco.com/2004/01/aml-session" soap-env:mustUnderstand="true" soap-env:role="http://www.w3.org/2003/05/soap-envelope/role/next">

<aml-session:To>http://tools.cisco.com/neddce/services/DDCEService</aml-session:To>

<aml-session:Path>

<aml-session:Via>http://www.cisco.com/appliance/uri</aml-session:Via>

</aml-session:Path>

<aml-session:From>http://www.cisco.com/appliance/uri</aml-session:From>

<aml-session:MessageId>1058:SSI1442BFRC:4EFA0641</aml-session:MessageId>

</aml-session:Session>

</soap-env:Header>

<soap-env:Body>

<aml-block:Block xmlns:aml-block="http://www.cisco.com/2004/01/aml-block">

<aml-block:Header>

<aml-block:Type>http://www.cisco.com/2005/05/callhome/diagnostic</aml-block:Type>

<aml-block:CreationDate>2011-12-27 17:54:09 GMT-00:00</aml-block:CreationDate>

<aml-block:Builder>

<aml-block:Name>UCS 6100 Series Fabric Interconnect</aml-block:Name>

<aml-block:Version>4.2(1)N1(1.43q)</aml-block:Version>

</aml-block:Builder>

<aml-block:BlockGroup>

<aml-block:GroupId>1059:Serial Number:4EFA0641</aml-block:GroupId>

<aml-block:Number>0</aml-block:Number>

<aml-block:IsLast>true</aml-block:IsLast>

<aml-block:IsPrimary>true</aml-block:IsPrimary>

<aml-block:WaitForPrimary>false</aml-block:WaitForPrimary>

</aml-block:BlockGroup>

<aml-block:Severity>6</aml-block:Severity>

</aml-block:Header>

<aml-block:Content>

<ch:CallHome xmlns:ch="http://www.cisco.com/2005/05/callhome" version="1.0">

<ch:EventTime>2011-12-27 17:54:09 GMT-00:00</ch:EventTime>

<ch:MessageDescription>Fabric Interconnect B, management services are unresponsive</ch:MessageDescription>

<ch:Event>

<ch:Type>diagnostic</ch:Type>

<ch:SubType>GOLD-major</ch:SubType>

<ch:Brand>Cisco</ch:Brand>

<ch:Series>UCS 6100 Series Fabric Interconnect</ch:Series>

</ch:Event>

<ch:CustomerData>

<ch:UserData>

<ch:Email>xyz@xyz.com</ch:Email>

</ch:UserData>

<ch:ContractData>

<ch:CustomerId>abc@abc.com</ch:CustomerId>

<ch:ContractId>ContractID</ch:ContractId>

<ch:DeviceId>N10-S6100@C@SSI1442BFRC</ch:DeviceId>

</ch:ContractData>

<ch:SystemInfo>

<ch:Name>System-A</ch:Name>

<ch:Contact>Name</ch:Contact>

<ch:ContactEmail>xyz@xyz.com</ch:ContactEmail>

<ch:ContactPhoneNumber>+00-0000000000</ch:ContactPhoneNumber>

<ch:StreetAddress>Office Address</ch:StreetAddress>

</ch:SystemInfo>

</ch:CustomerData>

<ch:Device>

<rme:Chassis xmlns:rme="http://www.cisco.com/rme/4.0">

<rme:Model>N10-S6100</rme:Model>

<rme:HardwareVersion>0.0</rme:HardwareVersion>

<rme:SerialNumber>SerialNumber</rme:SerialNumber>

</rme:Chassis>

</ch:Device>

</ch:CallHome>

</aml-block:Content>

<aml-block:Attachments>

<aml-block:Attachment type="inline">

<aml-block:Name>sam_content_file</aml-block:Name>

<aml-block:Data encoding="plain">

<![CDATA[

<faultInst

ack="no"

cause="management-services-unresponsive"

changeSet=""

code="F0452"

created="2011-12-27T23:24:09.681"

descr="Fabric Interconnect B, management services are unresponsive"

dn="sys/mgmt-entity-B/fault-F0452"

highestSeverity="critical"

id="2036245"

lastTransition="2011-12-27T23:24:09.681"

lc=""

occur="1"

origSeverity="critical"

prevSeverity="critical"

rule="mgmt-entity-management-services-unresponsive"

severity="critical"

status="created"

tags=""

type="management"/>]]>

</aml-block:Data>

</aml-block:Attachment>

</aml-block:Attachments>

</aml-block:Block>

</soap-env:Body>

</soap-env:Envelope>

We want to understand that what is the impact of this error and is there anything that we can do to prevent this error? Also want to know what might be the cause get this error?

Let me know if anything else is needed from my side

show-tech file uploaded.

1 Accepted Solution

Accepted Solutions

Amit,

I have reached out to TAC engineer and will get back to you. Also, please upload latest UCSM show tech to SR.

" show cluster extended state " would show cluster state.

For core dumps, you check from the Admin tab of UCSM

Padma

View solution in original post

10 Replies 10

padramas
Cisco Employee
Cisco Employee

Amit,

Since you already have TAC SR for this issue, please get in touch with TAC engineer with an update about reoccuring alerts.

We would need logs to better understand the behavior.

Providing additional information like

Is the alert generated only for FI B or both FIs

Any change in cluster state corresponding to alert time stamp,

Cluster physical link status,

Does FI have any core dumps

etc would be helpful.

Padma

Padma,

TAC Engineer sent below mail

Hi Amit,

I’ve checked through the show tech you’ve uploaded and have not found any indicators of errors for the error message you are seeing.

As I mentioned in the call, the error is a transient error from which the fabric interconnects can automatically recover from. The recommended action is to wait for a few (10-15min) to see if the error clears automatically. If the error does not clear then we will need to do further troubleshooting. This error on its own is not a cause for worry. As you have HA in your system the management services would have failed over the to the other fabric interconnect and would not affect your system performance.

We can leave the system under observation for a few days to see if other errors occur concurrently with this error.

I will upload show-tech logs here, find my reply below

Is the alert generated only for FI B or both FIs ->> Amit: Alert generated for FI-B only

Any change in cluster state corresponding to alert time stamp ->> Amit: Unfortunately when this error generating we are unable to see the cluster state because of timing. If you can guide / suggest from any other location I can find the state that will be helpful

Cluster physical link status ->> Amit: Cluster link is OK

Does FI have any core dumps ->> Amit: I don't have any idea about this. How can check this ?

Regards,

Amit Vyas

Amit,

I have reached out to TAC engineer and will get back to you. Also, please upload latest UCSM show tech to SR.

" show cluster extended state " would show cluster state.

For core dumps, you check from the Admin tab of UCSM

Padma

Padma,

Below is the screen shot of "show cluster extended-state" command. Something is really strange in this, we have total 4 number of chassis but I can see HA READY for only 3 chassis.

I will upload latest "show-tech" to SR and there is no Core dumps available under "UCSM-> Admin-> Core Files" option.

-Amit

Amit,

It is normal for only 3 chassis to be displayed in "show cluster extended-state."  UCS uses up to 3 chassis for quorum when determining primary/suboridinate roles.  The above screenshot shows a stable system.

Although the system is on 1.4(3q), we observing PSU I2C errors (CSCtq10987), likely carried forward from an upgrade.  Additional details and resolution steps for your particular case are provided in the TAC SR.

For reference, customers can review the Chassis ->IOM->I2C.log file sections "error_pca9541_per_device" for EBUSY errors to indicate which device is causing the I2C bus noise.

Thanks,

Matthew

mwronkow
Cisco Employee
Cisco Employee

Please send me a private message with the TAC SR. I will follow up when I return to the office tomorrow.

Sent from Cisco Technical Support iPhone App

Hi Matthew,

I have sent you private message.

-Amit

Any resolution on this? We are seeing the same issue running 2.0(1s) and have an open SR. Very annoying. Over 30 Call Home email alerts (pairs) in less than 30 days. Some days we get none, other days we get multiple...

Hi Robert,

Not sure whether work around will work on 2.0(1s)? because we are having 1.4(3q) where we are facing this issue.

We have got below work around for this.

     As suspected, I2C communication is causing the SEEPROM errors which turn causes Callhome alerts.

     To move forward, we need to identify the noisy PSU that causes I2C issues.

     -- To be on safe side, make sure that you are not running any critical apps on the chassis 1

     -- Remove PSU X and gather the output of the following commands every 60 seconds for period of 3 -5 min

          connect local-mgmt a

          show tech chassis 1 iom 1 brief | no-more

          show tech chassis 1 iom 2 brief | no-more

          show tech chassis 1 iom 1 brief | egrep 'fixup|lostar'

          show tech chassis 1 iom 2 brief | egrep 'fixup|lostar'

      If the value stops incrementing for these two counters, then we have removed the defective PSU from the system.

     -- If it still increments, repeat the above steps by removing one PSU at a time.

I guess your TAC engineer will give your more clarity for this error for 2.0(1s)

Regards,

Amit

Excellent! It seems ressetting the PSUs addressed this for us, as well as addressing some "device CHASISS_SN, error accesssing shared-storage" warning faults we were seeing.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card