I have a subscriber that is periodically throughout the day logging the following error.
Error: CodeYellowExit - CodeYellowExit
Expected Average Delay: 0
Entry Latency: 60
Exit Latency: 24
Sample Size: 10
UNKNOWN_PARAMTYPE:Time Spent in Code Yellow: 13969
Number of Calls Rejected Due to Call Throttling: 18
Total Code Yellow Exit: 38
High Priority Queue Depth: 0
Normal Priority Queue Depth: 0
Low Priority Queue Depth: 0
App ID: Cisco CallManager
I have run tracing and performance monitor and I'm seeing no spikes in CPU usage.
What is unique about this CallManager subscriber? How many servers are in the cluster? Are devices registrations equally distributed between the different subscribers? Were the traces and performs from the same time period as when the codeyellows occurred? Does your CallManager service anything more than just phones, such as agents in a Call Center or something else?
CallManager goes into codeyellow when one or more processes takes up an inordinate amount of cpu cycles in an attemt at self-preservation. In your case, it is rejecting calls because processing those calls at that particular time will cost it precious cpu cycles.
There are 3 servers in this cluster. 1 Pub 2 Subs. The number of serviced phones are almost equal between the 2 subs. When traces were run the event did occur during the same time. It only services phones and h323 sites. No call center agents. I have compared it to other servers not having the issue to confirm there are no services running that are diffrent. I was told a possible issue could be to many devices unregistering and trying to register back at the same time but when I look at the log times there are no unregistration, registration attempts 5-10 prior to the error.
Is there a pattern to when the codeyellows are encountered? Do you notice that they happen more at a particular time of the day or week? See if you can identify a trend or pattern.
Because codeyellows could also be caused by excessive disk and network writes, also check the level of traces being written. Are SDL and SDI traces set to detail? If they are, you may consider reducing them to at least the default level. Also if there's an extra drive other than C:\, on the server consider changing the trace directory to it other than to C:\.
Hope this helps.
The times are very random. It could occur as much as twice in 2 minutes or happen 3-4 hours apart. Also I confirmed that all tracing is turned off and has been for some time.
I also have a similar problem, I think it could be the same problem. I also opened a TAC case and I don't have any solution yet, the TAC engineer recommended to verify the QoS between remote sites and the cluster but it doesn't seem to be the problem in my network. I tryed many things, I even reinstalled the three servers (Pub and subscriber) with the last patch version. I'm still looking for the solution because the TAC doesn't give me somthing useful.
If you are still having the same problem, we can try together to solve it.
I am having a similar issue with a customer I support. Has there been a resolution? Was QoS applied correctly across the WAN?
I am also having a very similar issue:
The problem appeared after upgrade from 4.1(3) to 6.1(1a) with a cluster of 2 servers.
It happens with only 4 phones out of 200.
All phones should register to the sub but these phones are registering to sub and then flapping between pub and sub and unregistered state...
These 4 phones are 7970 but I have other 7970 without any problem. I can ping and access the web interface of these phone without any problems...
Still have no solution.
have this issue have workaround?
we have received this kind of alarm just today. Hopefully someone can shed lights what causing it.
Ok let me explain first what Code Yellow is,
Code Yellow automatically throttle call attemps based on the expected delay in the SDL high or normal priority queue, which will
protect the CCM from high CPU usage due to burst of call attempts that exceed the
threshold that the CCM can support, or an incorrectly configured routing step.
In order to find what is triggering this condition we need CUCM SDL, SDI traces and Perfmon logs, when the event occurs it needs a deep analisys of an specific server or specific customer, this analisys will tell us how SDL Q look before the event and if SDL Qs are ok we may see if there is a problem in CPU, I?O memory with perfmon logs.
As previous partner did, TAC case may be necessary for this analisys?