We support many 2811 gateways at customer sites, all handling T.37 fax (almost entirely onramp) and nothing else. Occasionally we see bursts of the following in the logs, showing that incoming fax calls are being rejected due to high CPU:
Sep 12 12:07:14.980 CDT: %IVR-3-LOW_CPU_RESOURCE: IVR: System experiencing high cpu utilization (96/100).
Call (callID=1365621) is rejected.
Sep 12 12:07:20.864 CDT: %IVR-3-LOW_CPU_RESOURCE: IVR: System experiencing high cpu utilization (96/100).
Call (callID=1365622) is rejected.
Sep 12 12:07:50.564 CDT: %IVR-3-LOW_CPU_RESOURCE: IVR: System experiencing high cpu utilization (98/100).
Call (callID=1365625) is rejected.
In the above example, the logs continued for about 7 minutes. A 'show processes cpu history' was done about half an hour after the event and showed this in the 'last 60 minutes' graph:
Most recent CPU is on the left, so this shows that the CPU was ramping up over a period of 20-25 minutes, remained at 100% for about 8 minutes, then abruptly dropped back to normal. Uptime was 30 weeks when this happened. Because it's so infrequent and we have no way to reproduce it, it's unlikely that we'll be able to grab diags while the CPU is at 100%.
The IOS version is 12.4(25f), running on a 2811 with the NM-HDV2-2T1/E1 board and the PVDM2-48 DSP card. The DSP firmware is overridden to 26.4.501 due to previous issues with out-of-spec tones and outbound fax not working. It has 512MB memory and 128MB flash. I see there's a 12.4(25g) IOS version available, but the caveats list doesn't seem to contain anything relevant.
Any ideas as to what might be causing this? A rogue fax? Is IOS 15 likely to help? Some of the fax gateways we support only have 256MB memory so won't be able to go to IOS 15 easily.
Note that these calls have been up for over 3 hours. This particular example is on a 2811 running IOS 15.1(4)M7.
We cured the high CPU by killing the two long-lived calls:
fgw#clear call voice causecode 31 id 32FF
fgw#clear call voice causecode 31 id 3302
We worked with Cisco TAC on this a while back, and found that when this happens the CPU is consumed by the DocMSP process getting into a loop. However, TAC refused to go any further because our onramp TCL script is customized to pass through the RDNIS when present, so that we can use that as the fax target number. Not really a relevant change, but rules are rules.
Anyway, we had DocMSP debugs enabled in this case (debug fax dmsp all, with console logging disabled). They confirmed looping for the two calls; here's one of the loops:
SIP traces provide key information in troubleshooting SIP Trunks, SIP
endpoints and other SIP related issues. Even though these traces are in
clear text, these texts can be gibberish unless you understand fully
what they mean. This document attempts to br...
Please find the attached HTML document, download and open it on your PC.
This provides an easy to use form where you simply answer a few
questions and it will render the proper jabber-config.xml file for you
to copy/paste. There is built in logic to verif...
CUCM Database Replication is an area in which Cisco customers and
partners have asked for more in-depth training in being able to properly
assess a replication problem and potentially resolve an issue without
involving TAC. This document discusses the bas...