CUC 8.6(1a) -- Experiencing Memory Leak with setroubleshoot -- Low SWAP & VirtualMem

Unanswered Question
Dec 1st, 2011

We have a newly deployed CUC 8.6(1a) cluster that about a week and a half old. The primary (active) server has rapidly been sending RTMT alerts for LowSwap and LowVirtualMemory available and now looks to be maxing out (I saw SWAP at 100% last I checked).....it tends to sit in the top 0-8 percentile for available for each.

Under processes I see that "setroubleshoot" is using over 3GB+ of VmSize itself...I've only been able to find very little information around this, nothing in the bug toolkit, pretty much this:

- https://puck.nether.net/pipermail/cisco-voip/2011-October/024706.html

I'm worried the server might crash soon. Has anyone came across this previously? Any workaround? Is there a fix or a solution? Any help would be greatly appreciated!

At current state we have set the SElinux to permissive instead of secure. This seems to slow down the memory leak, but does not resolve it. About 1% increase of used Virtual Memory and SWAP every 1-2 days...

Thanks!

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 5 (3 ratings)
Joel_Jones Mon, 12/05/2011 - 14:47

We are experiencing the same thing. We had our CUC 8.6(1a) HA Cluster reach 2GB Virtual.Memory and crashed today at 12:50pm. This cluster has only been up since October. After rebooting this publisher we noticed it came back up and has gone from a obvious 0MB of Virtual.Memory upon reboot TO 320MB of Virtual.Memory. So from 1:30pm to 5:30pm we've seen this server go up 120MB's at 3:00pm and then up to 320MB at 5:30... This HA cluster only has about 500 users at this point.

We have another client using a CUC 8.0 HA cluster with 4,000 users. This 8.0 HA cluster has never passed 6MB's of Virtual.Memory. There is an obvious issue with 8.6(1a).

Anyone else seen this?!

(Were currently working a TAC on this and Cisco does not have an answer, yet.)

Thanks,

Jonathan Schulenberg Mon, 12/05/2011 - 14:56

We had a customer with a similar issue on 8.5.1. The system exhausted physical and virtual memory before it crashed. Note that in this case there was suspicous SAN disk contention which is believed to have been the virtual memory thrashing the controller. The problem started after upgrading from CUC 8.0.

It was a massive dramafest. Ultimately TAC and the BU came back to say that it's "expected" that CUC will use as much memory as you give it. It didn't matter if we threw another 8GB at the problem; it chewed it right up. As weak as an explination as that is, it hasn't happened again. So, good luck getting TAC to nail this one down.

Joel_Jones Tue, 12/06/2011 - 06:22

Thanks Jonathan for the quick response! We are on the phone with TAC again and working on getting this escalated to the BU. We dont have much faith in the support either but were not letting them go because we had the same issue with UCCX. Still dealing with the BU on the UCCX issue, so were trying to get the same relationship with this CUC issue. Hopefully we get a better response than you got. Either way ill post back later.

Thanks again,

koziollz1 Tue, 12/06/2011 - 06:44

Thank you for your input.

I am working with Cisco TAC as well, and similar to your findings not really getting anywhere.

I rebooted the client's primary CUC server as a preventative measure last Thursday and the VM & SWAP both dropped down to 0-4%. As of today I am seeing the Virtual Memory usage is at 70% and SWAP is at 60%. I don't want to have to keep rebooting their server every few days to maintain stability.....

I will continue to push on TAC to drive this futher. Please keep me in the loop on your findings and I will do the same from my end.

Thank you.

Joel_Jones Tue, 12/06/2011 - 11:15

Hey everyone, it looks like cisco is saying the following defects/bugs have been found. Thought I'd comunicate what they told us.

Here are the lists of bugs found in the toolkit. (these were sent from our TAC investigation)

CSCtu04746

CSCtq86413

CSCtu26663

We have more information coming soon, but thought i'd communicate this out.

koziollz1 Fri, 12/16/2011 - 09:01

Setting the SELinux to permissive instead of secure slows down the memory leak, but does not resolve it still...

Joel_Jones Mon, 12/19/2011 - 06:05

Cisco TAC/BU is saying that CUC 8.6(1)ES9 will resolve the issues. We are upgrading our non-production cluster this week. Hopefully it resolves it.

Joel_Jones Tue, 12/20/2011 - 09:04

koziollz1 wrote:

Setting the SELinux to permissive instead of secure slows down the memory leak, but does not resolve it still...

koziollz1, we've noticed SELinux seems to be an issue across all 8.6.1 platforms (CUCM,CUC,UCCX,UCCE). We only have a CUC cluster on 8.61, so everything else is from what I've read. Since its a new feature to the 8.6 core OS, it seems they are starting to notice that it is blocking/disturbing certain features as well. All I know is that cisco as well as other forums are suggesting SELinux to be set to permissive until future updates. We are keeping ours one cluster set to permissive even after the CUC 8.6(1)ES9.

Jonathan Schulenberg Tue, 12/20/2011 - 09:09

It should be noted that setting SELinux to permissive is roughly equivelent to disabling CSA in previous releases. SELinux replaced CSA in 8.6. Be sure your security policy allows for this; it leaves the server/daemons unprotected.

koziollz1 Tue, 12/20/2011 - 09:16

Joel Jones wrote:

koziollz1 wrote:

Setting the SELinux to permissive instead of secure slows down the memory leak, but does not resolve it still...

koziollz1, we've noticed SELinux seems to be an issue across all 8.6.1 platforms (CUCM,CUC,UCCX,UCCE). We only have a CUC cluster on 8.61, so everything else is from what I've read. Since its a new feature to the 8.6 core OS, it seems they are starting to notice that it is blocking/disturbing certain features as well. All I know is that cisco as well as other forums are suggesting SELinux to be set to permissive until future updates. We are keeping ours one cluster set to permissive even after the CUC 8.6(1)ES9.

Thank you for your details. Is the ES9 available in the download center or do I need to request it from my TAC engineer for download?

Looking forward to hearing the outcome with your lab environment to see if this resolves the issue. I am planning another reboot as a preventative measure this week, might be a good time to complete the update as well.

My TAC engineer did point me in the direction of bug CSCtq86413:

http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCtq86413

Since he was able to locate the following error message within the system logs:

  • "Dec 12 14:07:14 UC01 user 6 setroubleshoot: SELinux is preventing the CuSnmpAgent (cuc_snmp_t) from connecting to port 20500. For complete SELinux messages. run sealert -l e6c0aa1f-41e6-40eb-aa74-aee6dcc87abe"

I hope to try the workaround this afternoon.

Thx again!

Joel_Jones Tue, 12/20/2011 - 09:21

ES9 is not available yet, but were waiting to hear back from them tomorrow during our conference call. I'll let you know how to get it.

We tried the work around for the CSCtq86413 bug and saw nothing from it... So good luck hah.

koziollz1 Thu, 02/02/2012 - 09:28

I was able to install ES9 from Cisco TAC. This has fully resolved the issue and taken care of the bug, the cluster has been running stable for a few weeks now. ES9 is the solution. Thank you all for your help!

Actions

Login or Register to take actions

This Discussion

Posted December 1, 2011 at 10:45 AM
Stats:

Related Content

Discussions Leaderboard