We are doing T.37 fax on 2811 and 2911 routers, with calls coming in over two T1 links. Occasionally the unit hits a bug where a call gets stuck and the CPU gradually rises up to near 100 over about 45 minutes. It stays close to 100%, almost all in the DocMSP process, with the unit rejecting all incoming calls until something tears down the stuck call when everything returns to normal.
Cisco TAC have been unable to identify or fix the bug, so we have implemented an EEM script to detect the high CPU and bounce the two T1 links. Here is the script, triggered on the call rejection logs:
The script seems to work fine functionally (tested by having it trigger off a user-defined log event instead of the high CPU event), but it seems that when the CPU is very high the script definitely gets triggered but often just doesn't seem to run. 30 minutes or an hour later, it still hasn't bounced the T1 links.
We have the following config line attempting to give more priority to the EEM script, but it doesn't seem to be helping much:
scheduler allocate 40000 5000
I have also seen mention of a 'scheduler interval' command to allow time for low-priority processes, but that doesn't seem to be available on this platform.
Any suggestions for other ways to give more priority to the EEM script, or better values for the 'scheduler allocate' command?
event manager applet high_cpu_recovery event ioswdsysmon sub1 cpu-proc taskname “DocMSP” op gt val 50 is-percent true period 60 action 1.0 syslog msg "----HIGH CPU DETECTED, BOUNCING T1s----" ... and so on ...
This difference from your script is triggering on IOS system monitor counters rather than a syslog message. The theory being that using the IOS system monitor counters will allow you to watch the CPU utilization for the DocMSP process and run your script before the CPU reaches 100% so there's some CPU left to run it. I don't know if 50% ("val 50" above) is the right number for the threshold, given your long experience with this issue you know what constitutes values that aren't sane for DocMSP CPU utilization.
My syntax above may not be 100% correct, if not it's documented here:
This will not help if, as I propose, the maxrun time is being hit. When the CPU is high, and especially if AAA command authorization is being used, each command can take a long time to execute thus pushing the policy toward its default 20 second maxrun time. I would look at maxrun first, especially if the "show logg" shows the syslog message is being generated.
Hi everyone, I would like to thank you in advance for any help you can provide a newcomer like myself!
Im studying the 100-105 book by Odom and am currently on the topic of Port security. I purchased a used 2960 and I'm trying to follow a...
While deploying a number of 18xx/2802/3802 model access points (APs), which run AP-COS as their operating platform. It can be observed on some occasions that while many of their access points were able to join the fabric WLC withou...