UCS suffered a major melt-down today with both fabric interconnects deciding to reboot. As far as I know I have no HA policy set which would cause a fabric interconnect to reboot on its own. A similar error is on the other fabric interconnect. My VAR is trouble-shooting this but I wanted to get some other brains on this because the impact was total. VMWare could not fail over properly and domain controllers, SQL pretty much everything was all pooched. I have attached the first couple hundred events from UCS. VAR thinks I hit a bug and should just upgrade.
cucs-1-A(nxos)# show system reset-reason ----- reset reason for Supervisor-module 1 (from Supervisor in slot 1) --- 1) At 633300 usecs after Wed Apr 9 08:53:29 2014 Reason: Reset triggered due to HA policy of Reset Service: monitor hap reset Version: 5.0(3)N2(2.1w)
Software BIOS: version 3.5.0 loader: version N/A kickstart: version 5.0(3)N2(2.1w) system: version 5.0(3)N2(2.1w) power-seq: Module 1: version v1.0 Module 3: version v2.0 uC: version v188.8.131.52 SFP uC: Module 1: v184.108.40.206 BIOS compile time: 02/03/2011 kickstart image file is: bootflash:/installables/switch/ucs-6100-k9-kickstart. 5.0.3.N2.2.1w.bin kickstart compile time: 2/3/2012 18:00:00 [02/03/2012 18:15:13] system image file is: bootflash:/installables/switch/ucs-6100-k9-system.5.0 .3.N2.2.1w.bin system compile time: 2/3/2012 18:00:00 [02/03/2012 20:16:06]
Hardware cisco UCS 6248 Series Fabric Interconnect ("O2 32X10GE/Modular Universal Platf orm Supervisor") Intel(R) Xeon(R) CPU with 16622556 kB of memory. Processor Board ID FOC15485QRG
I am really never on the Fiber Interconnects to execute any CLI commands. But a memory leak for some other reason would certainly be a possibility. The fact that its own High Availability process causes double reset in its own High Availability architecture is troubling.
Moquery is the command line cousin of Vizore, it's very helpful and efficient sometimes during the troubleshooting. This article aims to provide moquery cheat sheet to the users for some most common seen scenarios.
Here is the checklist before customers/partners contact Cisco TAC:
Firmware Version of APIC and Switch
Download Switch and APIC techsupport logs
Problem description (Symptoms with details)
Business impact (eg, what kind of services...
moquery usageAPIC moquerySwitchmoquery
This document discuss a common issue observed during the VMM integration & VM workload migration to ACI fabric.
VMware Virtual machines are hosted in Cisco UCS-B seri...