We are in a small situation with our Call Manager Network and I wanted to bounce a few things off everyone in this community.
We have 1 publisher, 4 subscribers, and about 1800 phones, on the way to 5000. We are a large medical facility in SC and are starting to experience some major pains with our system. We have opened a TAC case with Cisco about the issues of delays, echos, transfer problems, etc. They started off light and have gradually increased. Cisco is telling us we should reboot the servers starting with the publisher. They also told us that we should plan on rebooting them about every three to six months.
I see this as a big problem. As a medical facility, we cannot facilitate a reboot of 5 servers every 3 months. We have several services that run 24/7/365 and cannot go down.
My question really pertains to others who have large IPT networks and have situations where downtime is not really an option. How do you handle these issues?
We have a similar deployment, 1 pub, 1 tftp + 4 subs and about 7500 devices.
Up until April, we were doing a monthly restart of all our IP telephony servers, including unity, exchange, IPCCx, etc.
We were hoping that 4.1(3)sr4 would be stable enough, but alas, we are now running into problems. Forwarding not working on phones, CTI route points failing, voiceports getting into a hung state, etc.
Cisco's recommendation of rebooting every 3 to six months is way to long in my opinion.
We are now considering going back to monthly restarts. We might consider once every two months, but it's not worth the potential unexpected service interruptions. It's a lot easier to say that you are rebooting your servers between 3 AM and 6 AM and let people know what can happen.
The services don't really go down, the phones will failover to the backup server and any calls in progress will be maintained, granted with limited features.
I strongly disagree with TAC on this one. I don't believe you should take this advice at all. Please seek another opinion from a differnt TAC Engineer (They are not all created equal) You should not have to reboot your servers every 3 to 6 months. We have 10,000 phones in about 72 campuses. We run IPCC for our callcenter, CER and many other Cisco VOIP products. We have not had to reboot any of our 6 callmanager in our cluster in over a year. I have seen clusters that have not been rebooted for longer.
The symptoms that you are describing sound more like network issues to me and might not have anything to do with callmanager. The delays in callsetup that you are talking about are almost all network issues or high CPU usage on callmanager (90% network issue). As far as the echo this has to do with ports on gateways and can be easily fixed. If you search CCO there are severl good docs on troubleshooting echo problems. http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080149a1f.shtml
As far as the transfer problem this can be alot of different things. You would have to be more specific.
Again you should not have to reboot every 3 to 6 months in my opinion.
Thanks Guys. I agree, I don't think we should have to reboot ever. We have a legacy 9 5 phone switch that hasn't been rebooted in 8 years. I think we are looking for five 9's of service out of these things moving forward.
I don't think we are having network issues. We have what I consider to be a very stable network infrastructure. Jason, who did your installation? Maybe we need to have them look at our installation.
I did the installation for us. There a ton of companies out there that do IPT installations. I don't have any idea who would be in SC. The symptoms you describe really sound like a network issue. I would investigate further the network aspect. When it comes to IPT this brings a whole new complexity to the network. Such as QOS. A way you can tell if the delays are from the network or from callmanager is to get a callmanager trace of the time the call is placed and then put a sniffer on the phone. See what the time is between when the callmanger sends a start media transmition and you start to get RTP packets on the phone. This will tell you for sure if there is a network problem. You should be able to reach the 5 9's mark with Cisco VOIP.
We've had our voip installation for about 7 years now, and before implementing our monthly restarts we were plagued with issues. Nothing voice quality related, more feature related. It covered everything from setting/unsetting forwarding targets, park numbers not working, hunt pilots not working, voicemail ports hanging, etc. The standard response from the TAC was that we consider rebooting during off-hours, failing that upgrading our version of CallManager, TSP, etc. While on a number of occassions we spent some time troubleshooting the trace files, they always said, please reboot and see if that fixes the problem.
I don't think it's an issue with Cisco programming per se, the underlying platform is Microsoft Windows and Microsoft SQL.
Then I wonder if going to a Linux platform with version 5 or 6 would buy us anything? I have heard from a Cisco engineer that the upgrade would possibly buy us more time between reboots. Who is running on the Linux platform today?
I would put my money on a Linux platform buying more time between reboots as well as fewer 'bugs'. Perhaps not fewer bugs in the first few releases, but less and less. While many bugs can be attributed to programming issues, I'll bet some of them, if not most, are related to the underlying issues of Windows itself, i.e. memory leaks.
Sort of reminds me of our old Novell servers which were not rebooted for more than a year on most occassions. When you start with a solid foundation, you're good to go! ;)
I'm not able to access my old voice mail messages all of a sudden. The recording says something like 'the message is currently not available'. This has never happened before in all the years I have been using this system. I have t...