I face a strange issue here. The callmanager service suddenly crashes on a subscriber. When I checked some historical records, I found that this was happening repeatedly after a gap of 2 months or so. The callmanager service stops responding and gets restarted automatically on any of the subscriber server. In the latest event, the service did not come up on its own. I had to restart it manually.
Is this happening at a certain time of day, or completely random times? If it is happening in the late night time hours, early am hours then perhaps the nightly jobs (sql jobs, BARS, etc) may be starving out the resources. I have seen this cause ccm to fail at these times. The end result there was due to CDR database being large and it taking awhile to clean up or BARS failing and when BARS fails the databases may not be truncated automatically, which causes them to get bigger and thus more time/CPU to run nightly jobs against them.
Are there any errors before the time the service terminates in the application/system event logs?
If you have detailed ccm and sdl traces on, and the traces are still there from the time of last crash save them to a safe place and also see if there are any DrWatson files from the event.
This can be caused by several things (it may be wise to pursue with TAC since it can get quite involved), but the call manager service will terminate itself if it doesn't get enough CPU cycles in a 30 second period and try to restart. The number of restarts is controlled by windows, under services tab and there are 3 failure options so after 3 times it won't start itself automatically anymore without it being started manually.
Initially I also thought that this is happening because of BARS. Actually once it has happened exactly while BARS was running. But later I noticed that this is happening at random times but yes, mostly during nights. Apart from BARS I am not sure what other processes run during night.
I have collected all the CCM and SDL traces during that time and have sent it to TAC
Of course, as you have suggested, this most probably looks like a resource crunch issue. But is there any way to find the exact reason which caused this resource crunch. event logs do not show any errors before this crash.
How big is your CDR databases? There is a Optimize CIPT SQL Job that runs at 1am by default and the first step of this job is to work on the CDR database. If that database is huge due to lack of being maintained, then I have seen the Optimize CIPT job sometimes cause CCM to fail due to resource issues at time.
If you go under SQL Enterprise Manager, and look under the Management section then look at the history on the Optimize SQL Job you can see how long it takes to run. If it is taking a long time (5+ Minutes usually) this is probably the problem. Theres a view details button there to see how long each step of the job takes, or you can look at the job steps and paste each step into the SQL Query analyzer one a time and see which step is taking awhile to run and may be having a problem and if there are any errors.
You may want to see if BARS is running successfully and truncating the CDR/CCM/ART databases as it should, if it runs fine. Is BARS backing up CDR? If not, it is not going to truncate CDR and you'll need to handle that database manually. Under CDR Tools/Analysis plug in (if installed) are you purging CDR records after so many days or don't have it set to purge? If it is not set, the CDR database will get to the max size (default is 1.5 million records) and you may have this problem. You may have to manually purge records and / or shrink this database.
Well BARS does backup CDR. I am also checking the parameters Eric has suggested. But the main issue here is CCM service has crashed on subscribers not on Publisher. Still just to make sure I am also checking BARS details
Are you getting this error “Installer User Interface Mode Not Supported. The installer cannot run in this UI mode. To specify the interface mode, use the -i command-line option, followed by the UI mode identifier. The value UI mode identifiers...
The below trick might come handy when you have to add a new node to a cluster but you don't have or is unsure of the security password for the publisher. This procedure has been around for ages.
1) Login into the CLI of the Publisher.