High CPU - Per-Second jobs

hegleran · ‎08-24-2009

I have a 3845 that I use as a BGP and Eigrp router. As of recently, I have been getting spikes of high CPU utilization. The top process is the 'Per-Second jobs' process. Cisco's description does not provide the most detailed explanation of what this process does:

Per-second Jobs

Performs a variety of tasks every second; executes registered one_second jobs

Is there more detail on what his process does in specific? Or even better, are there some debugs I could run that would more specifically show what is causing the CPU utilization to spike? There have not been any major changes to my configuration from before this issue began. I recently turned on nbar protocol discovery, but that is about the only recent feature change on the box. Any help would be greatly appreciated.

pompeychimes · ‎08-24-2009

I'm not familar with that process either. However, what percentage is typically associated with it?

hegleran · ‎08-25-2009

It ranges. Right now it is the second highest consuming process:

43 636182900 44096272 14427 1.22% 1.29% 1.30% 0 Per-Second Jobs

However, it spikes and can take significantly more process resources. I have the router trap to my NMS every time it sustains 85% cpu for over 60 seconds, and then again when it drops back down below 50% for 60 seconds. The processor hits the 85% watermark dozens of times a day.

For instance, while typing this post, it has jumped up past the threshold:

CPU utilization for five seconds: 89%/84%; one minute: 58%; five minutes: 45%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

43 636188344 44096493 14427 1.31% 1.28% 1.30% 0 Per-Second Jobs

Mohitkumarp · ‎08-25-2009

Hi,

Please paste the output of sh proc cpu | e 0.0% & sh proc cpu history, when the cpu utilization is high

hegleran · ‎08-25-2009

History is attached in txt file due to message length restrictions.

CPU utilization for five seconds: 83%/80%; one minute: 80%; five minutes: 72%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

43 636462376 44106418 14430 1.22% 1.27% 1.28% 0 Per-Second Jobs

111 489137264-1791332276 0 0.49% 0.44% 0.47% 0 IP Input

289 5377804-711957626 0 0.32% 0.29% 0.30% 0 CH_GT96K Backgro

18 97547916 640478641 152 0.32% 0.27% 0.25% 0 ARP Input

293 24887201352935716 1 0.16% 0.10% 0.10% 0 PPP manager

2 762760 8663794 88 0.08% 0.13% 0.14% 0 Load Meter

294 16602601352935941 1 0.08% 0.05% 0.08% 0 PPP Events

39 48283664 26482667 1823 0.08% 0.12% 0.14% 0 Net Background

137 1574344 68859647 22 0.08% 0.08% 0.08% 0 CEF process

301 542705332-805060081 0 0.08% 0.05% 0.05% 0 IP SNMP

106 14241601341520080 1 0.08% 0.04% 0.05% 0 ACCT Periodic Pr

132 2657276 92965231 28 0.08% 0.03% 0.01% 0 DHCPD Receive

AH-3845-QoS#sh proc cpu hi

AH-3845-QoS 11:29:07 AM Tuesday Aug 25 2009 EDT

Joseph W. Doherty · ‎08-25-2009

From your follow on posts, it appears your processor is busy with interrupt CPU not process CPU (and "Per-Second Jobs", in particular, only showed a faction pass 1%).

Interrupt CPU generally relects "normal" packet forwarding. How much traffic is passing through the router during these CPU spikes?

hegleran · ‎08-25-2009

Attached are traffic stats during a time of high CPU load.

Traffic load is not too excessive. I've got a DS3, a handful of T1s, and 2 gigE interfaces. The gigE interfaces are for connectivity to the LAN, and to redirect traffic to the WAAS appliances. Anything hitting gig0/0 inbound is being policy routed. Total load based on bps or pps dose not seem out of the capacity of the router.

Edison Ortiz · ‎08-25-2009

A DS3 and a handful of T1s? As of 5 T1s and a DS3? Hmm, I think you are overloading this router and PBR isn't helping either.

__

Edison.

Joseph W. Doherty · ‎08-25-2009

I too would expect a 3845 to handle T3 without any issues. However, not sure of the impact of WAAS and PBR. With WAAS, typically LAN side bursts would be at a much higher rate, besides the WAAS redirection processing overhead. (I recall I've seen WAAS redirection add much to a router's CPU load.) It's possible these services, with your traffic load, are what's causing the higher than expected load.

I still see almost all the CPU load is within interrupt processing. The fact that your interface summary shows a few throttles might indicate the overall load, in your configuration, is just too much.

If you haven't already, you might review: http://www.cisco.com/en/US/products/hw/routers/ps359/products_tech_note09186a00801c2af0.shtml. You might also, if not already, try netflow and policy caching. (NB: I believe the latter isn't really necessary with CEF.)

hegleran · ‎08-25-2009

I appreciate everyone's feedback. I am already using netflow on this router, however not policy caching. The DS3 is not often pushing more than 8-10 megs of traffic, very rarely does it creep into the 30-40 meg range. If raw throughput was the problem, wouldn't I expect to see IP Input consuming a lot of processor resource, especially considering all lan side traffic is routed via PBR? Also, if the waas redirection were a problem, I would expect to see wccp eating up processor resources, which again isn't the case. I'm really stumped here!

Joseph W. Doherty · ‎08-25-2009

As to WAAS and PBR showing up under show processes, depends on whether the forwarding logic is process switched or fast switched. Over the years, Cisco has moved more and more forwarding logic to fast switched. I.e., either could be using up processor cycles, but if under interrupt processing, you don't see functional processing percentages breakdown.

Also, since WAAS is involved, it's not just a question of bandwidth consumption on the DS3 but also what's being used in the LAN interfaces.

Mohitkumarp · ‎08-25-2009

Hi,

From the output pasted i do not see high utilization on Proc cpu max is 1.28%

from the history i would see Traffic is too high.

As per my knowledge i would suspect that Router 3845 can handle upto 40-42 Mbps max if you are utilization DS3(45mbps)with additional configuration like IGP,EGP,accesslist,etc then router might not be able to handle the traffic.

Try the below option:

Rebooting the router which would clear the dead memory allocated etc.....

check the on all the LAN interface wheather fast switchig is enabled or not if not try enabling it and check the utilization.

hegleran · ‎08-26-2009

Thanks for the suggestions. I have fast switching enabled on all applicable interfaces, and I have confirmed that cef is enabled as well. I do have some long acls that pertain to PBR, but I have these on many of our routers and have not run into this issue before.

I'm still not certain what the trouble could be. I'm a bit hesitant to say that it is due to volume of traffic. Per http://www.cisco.com/en/US/prod/collateral/modules/ps2797/ps4909/product_data_sheet09186a008010fba2.html the 3845 and 3745 platforms can handle 2 DS3s. In fact our other major data center has a somewhat similar configuration running on a 2821. It does not have the extra T1s on it like this router, but it is doing PBR, DS3, BGP, EIGRP, WCCP all with similar ACLs. That router's processor never reaches the levels that I see on this router. Maybe a good old fashioned IOS upgrade and reboot is in order.

pompeychimes · ‎08-26-2009

If you haven't already I'd change the IOS just to see if the problem persists.

jeff.bly · ‎05-18-2010

I too am having a problem with high cpu after installing a WAAS device. I have a 7204 with an E3. The cpu is running 80 to 100%. From what some of you are posting you may not have noticed something. Do a sho proc cpu sorted 1min or 5min. This gives you the top process sorted from highest use to lowest. The total process do not add up to the total cpu usage. Reason being you have hardware interupts that are requiring a considerable amount of cpu time. I have had Cisco in my router twice and while the process have remaind very low they are seeing that there is a problem with an access list and QOS requests. Cisco had me run the following commands whenever the cpu ran over 75%..

Sh ip int
Sh align
Sh cef not
Sh cef drop
Sh ip tra
Sh ip ACL
sh ip wccp web-cache detail
sh ip wccp <#> detail
sh interfaces
ip accounting

Cisco also turned on CPU profiling while the cpu was high to record requests. The below command showed what the cpu was spend a lot of time with.

The decodes for the - show profile terse :

ipaccess_check_acl_common <
ipfib_pas_feature_fs

Because of the output from shor profile terse Cisco recommended the following.

-Removing the keyword "log-input" from the access-list (or reduce it as much
as possible) would help as well.

-Another best practice that you should use, is to apply TURBO ACLs, but
typically this creates more issues, so be careful when suggesting this.
For more reference about TURBO ACL you can see the following
url:http://www.cisco.com/en/US/products/sw/iosswrel/ps1829/products_feature_
guide09186a00800881a7.html

If you still see High CPU usage, then we can try to compile them so we don't
have to match subsequent packets going in and out after one successful
access-list lookup. The command would be:

config t
access-list compiled

Summarizing, ACL configuration and its optimization use to be the key to
drop the CPU to lower values. Also, never forget to check for a software bug
which could be causing the issue.