04-14-2014 08:20 AM
Hi,
Hopefully somebody can help ?.
We were upgrading our UCS Cluster , one of the 6248's upgraded okay but the other one crashed at 51% and now boots into a bash-2.05b# hash prompt accompanied by the following messages
bash-2.05b# 2014 Apr 14 15:12:36 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:134
2014 Apr 14 15:12:36 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
2014 Apr 14 15:12:38 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_statsAG crashed with crash type:134
2014 Apr 14 15:12:38 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
2014 Apr 14 15:12:39 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:134
2014 Apr 14 15:12:39 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
2014 Apr 14 15:15:34 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_dcosAG crashed with crash type:134
2014 Apr 14 15:15:34 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
2014 Apr 14 15:17:30 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_extvmmAG crashed with crash type:134
2014 Apr 14 15:17:30 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
2014 Apr 14 15:17:32 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_portAG crashed with crash type:134
2014 Apr 14 15:17:32 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
2014 Apr 14 15:17:33 %$ VDC-1 %$ %CALLHOME-2-EVENT: svc_sam_statsAG crashed with crash type:134
2014 Apr 14 15:17:33 %$ VDC-1 %$ %CALLHOME-2-EVENT: SW_CRASH
I cannot find how to recover from this as most documentation I have found does not quite cover this error.
Regards
Scott.
04-14-2014 11:11 AM
From which UCS version did you upgrade, and what was the target version ?
Did you do a manual upgrade, or autoinstall infrastructure ?
Disable Call Home before Upgrading to Avoid Unnecessary Alerts (Optional)
When you upgrade a Cisco UCS domain, Cisco UCS Manager restarts the components to complete the upgrade process. This restart causes events that are identical to service disruptions and component failures that trigger Call Home alerts to be sent. If you do not disable Call Home before you begin the upgrade, you can ignore the alerts generated by the upgrade-related component restarts.
04-15-2014 01:20 AM
Hi,
Upgrade was manual via the GUI, not Infrastructure. didnt disable Call Home :-(.
I think original was 2.0.2 and we were upgrading to 2.2.1b.
Any idea how we can recover the failed node ? So additional (helpful ?) extract from the log
2014 Apr 15 07:28:02 %$ VDC-1 %$ Apr 15 07:28:02 %KERN-2-SYSTEM_MSG: mts_acquire_q_space() failing - no space in dst sap 28, uuid 26, src sap 980, opcode 3176 - kernel
2014 Apr 15 07:28:02 %$ VDC-1 %$ Apr 15 07:28:02 %KERN-2-SYSTEM_MSG: mts_acquire_q_space() failing - no space in sap 28, uuid 26 send_opc 3176, pid 6441, proc_name stats_client - kernel
2014 Apr 15 07:28:02 %$ VDC-1 %$ Apr 15 07:28:02 %KERN-2-SYSTEM_MSG: node=4 sap=66 rq=0(0) lq=0(0) pq=801(2229984) nq=0(0) sq=0(0) buf_in_transit=0, bytes_in_transit=0 - kernel
2014 Apr 15 07:28:02 %$ VDC-1 %$ Apr 15 07:28:02 %KERN-2-SYSTEM_MSG: node=4 sap=980 rq=0(0) lq=0(0) pq=0(0) nq=1(0) sq=0(0) buf_in_transit=204, bytes_in_transit=19496688 - kernel
2014 Apr 15 07:28:02 %$ VDC-1 %$ Apr 15 07:28:02 %KERN-2-SYSTEM_MSG: node=4 sap=28 rq=209(19392060) lq=0(0) pq=0(0) nq=0(0) sq=0(0) buf_in_transit=0, bytes_in_transit=0 - kernel
2014 Apr 15 07:28:02 %$ VDC-1 %$ Apr 15 07:28:02 %KERN-2-SYSTEM_MSG: mts_deliver_local_atomic:mts_acquire_q_space failed for opcode 3176, src_sap = 980, num_dst 1, erro -16 - kernel
2014 Apr 15 07:28:53 %$ VDC-1 %$ %SYSMGR-2-SERVICE_CRASHED: Service "fwm" (PID 6440) hasn't caught signal 6 (core will be saved).
04-15-2014 08:02 AM
Did you upgrade UCS Manager as the first step to 2.2.1 ?
Maybe this helps a bit
http://jeffsaidso.com/2013/01/when-disaster-strikes/
https://supportforums.cisco.com/discussion/11964846/6248u-fabric-interconnect-bootloader-prompt
However, I would recommend to open a TAC case
04-23-2014 12:06 PM
Jeff has mentioned a similar case, see
http://jeffsaidso.com/2014/04/fabric-interconnect-booting-to-bash/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+JeffSaidSo+%28Jeff+Said+So%29
and his solution was very simple
bash# erase configuration
That should do it. The FI will reboot and come back up as if it were brand new and ask to create/join the cluster.
It goes without saying that this situation should not happen under normal circumstances, but I’ve heard rumblings of people seeing this here and there after upgrading to 2.2.x
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: