01-29-2009 11:13 PM - edited 03-15-2019 03:52 PM
We upgraded a cluster of 3 x CUCM Servers that were running 6.0.2 S/W version to version 6.1.2.
These are HP DL380 G5 boxes.
After the upgrade ASR (Automatic System recovery) caused two of the three Servers (one Publisher and one Subscriber) to reboot about once or twice a day. We then upgraded to 6.1.3 hoping that would resolve it. Yet the same problem.
The two servers (one Pub and One Sub or say one First Node and one Subsequent Node as the new terms are) with the reboot problem are in Site 1 and the stable Sub is in Site 2 about 20 Kms away. AC Power was elimintaed as a problem because this Data Centre has UPS and bucket load of CORE CAT 6500 switches and many many many other Windows / Linux Servers with no problem.
We disabled ASR and that just prevented the restart of the servers, but the servers just hung. When they hung we could still get to the servers through the ILO and click on reset to restrat the servers again.
The Pub abd one Sub never hang at the same time, but at different times.
Fix:
====
Site 1 also had a CUPS Presence Server running 6.0.2.
The interesting thing is that we rebooted this CUPS box and we haven't seen the problem for 1 week now.
I wonder if anyone else has seen this symptom.
02-04-2009 02:30 PM
Since the server itself reboots, the issue might not be related to CallManager but more of a platform or hardware issue.
There is an issue with the HP ASR (Automatic System Recovery) agent that causes the server to reboot randomly.
Check the bug: CSCsi75567
02-04-2009 03:21 PM
CSCsi75567 was initially thought to be the problem and ciscocm.disable-hpasm.cop.sgn package used by TAC and still had the same problem. Rebooting CUPS Server fixed the problem. Something to do with pushing policies by CUPS to CUCM servers.
02-06-2009 12:11 AM
Be aware about the following bug in CUCM 6.1(2) - CSCsv49493
7828-H3 server goes down with Journal Aborted error
Symptom:
Phone services will go down, and server will only be semi-responsive. Local console access will show the following error constantly scrolling across the screen.
EXT3-fs error (device sd(8,6)) in start_transaction: Journal has aborted
Conditions:
During normal operation services will go down. Reboot will bring services back up for a while, anywhere between a couple hours and a couple days. Seen most frequently on
MCS7828-H3-K9/BE but has been reported on MCS7825-H2-IPC1 and MCS7825-H3.
Workaround:
Shut down the server, and remove the first hard drive until a final fix is available.
If server still fails, try switching to the other drive. Watch during boot up for any errors which might indicate hardware failure (SMART errors in particular).
If server stills fails on 2nd drive, leave one drive in, and reinstall CUCM.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: