Cisco 3745 high cpu usage

Answered Question
Sep 3rd, 2007

Hi. For some time on one of ours Cisco 3745 routers we have been having a very high CPU usage around 70%. I know this is very high and wondered if you could help me to find the cause. Below is the output from the sh proc cpu | exclude 0.00%__0.00%__0.00% command.



CPU utilization for five seconds: 60%/25%; one minute: 69%; five minutes: 72%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

4 22848424 2914987 7838 0.00% 0.05% 0.05% 0 Check heaps

22 3151572 24641326 127 0.00% 0.02% 0.00% 0 Per-Second Jobs

35 6275548 425586 14745 0.00% 0.01% 0.00% 0 Per-minute Jobs

47 3490448562875681874 0 0.49% 0.74% 0.66% 0 IP Input

73 357226642514347994 0 0.08% 0.04% 0.05% 0 Socket Timers

115 9046748 21118779 428 0.08% 0.03% 0.02% 0 SAA Event Proces

126 310612044 34928714 8892 32.42% 25.41% 26.10% 0 FRF9 manager

131 10119524 139770053 72 0.00% 0.02% 0.03% 0 IP-EIGRP Hello

138 11579950442514347994 0 1.39% 2.08% 2.15% 0 Rtt Responder

149 5092728 74037445 68 0.00% 0.01% 0.00% 0 IP-EIGRP Router


The router has 128 MBs of memory and IOS version c3745-is-mz.122-13.T1.bin.


Any help is greatly appreciated.

Correct Answer by Joseph W. Doherty about 9 years 3 months ago

Quite a difference, especially if it holds!


If no one is actually using RTR/SAA, deactivation might make sense. Or, check the parameters and have the tests sample much, much less frequently.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (8 ratings)
Loading.
Paolo Bevilacqua Mon, 09/03/2007 - 11:55

Hi,


verify that "ip cef" is configured.

Send output of "show interface summary" if it happens again.


purohit_810 Mon, 09/03/2007 - 18:35


" 126 310612044 34928714 8892 32.42% 25.41% 26.10% 0 FRF9 manager "


Do you have configure QOS ??? Is it fuctioning properly?


Second, Put sniffer and checkout top ten talkers.


Regards,

Dharmesh Purohit

Joseph W. Doherty Tue, 09/04/2007 - 06:06

"126 310612044 34928714 8892 32.42% 25.41% 26.10% 0 FRF9 manager"


Are you doing frame-relay payload compression? If so, and if the compression is being done in software, as it might be in this case, expect it to consume CPU especially as the actual data rate increases.


If you want to eliminate this CPU consumption, and if it is caused by frame-relay payload compression, can stop doing the compression or some hardware modules will off-load some types of compression. (Without research, don't know if any are available or would do so for 3745.)

IgorHamzic Mon, 09/10/2007 - 03:15

Hi.Sorry for the late reply but I've been out of action so to speak since I last posted. So here is more information:


1. ip cef is configured

2. QoS is configured for some frame-relay sub-ifs, configured by previous administrator and never had any problems with them

3. frame relay compression is enabled for 1 frame relay sub-if


Output of show interface summary:


Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

---------------------------------------------------------------------

* FastEthernet0/0 0 505245 0 0 3466000 498 949000 448 0

* Serial0/0 0 688 0 2753634 15000 51 0 0 0

* FastEthernet0/1 0 438139 0 0 1114000 480 2472000 524 0

* Virtual-Access1 0 0 0 0 0 0 0 0 0



But something Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

---------------------------------------------------------------------

* FastEthernet0/0 0 505245 0 0 3466000 498 949000 448 0

* Serial0/0 0 688 0 2753634 15000 51 0 0 0

* FastEthernet0/1 0 438139 0 0 1114000 480 2472000 524 0

* Virtual-Access1 0 0 0 0 0 0 0 0 0


IQD seems very high on serial and fastethernet interfaces.


But I've been thinking of something else also. On the same router we have several subifs on Fastethernet 0/1 interface that we use to connect to several remote locations. Could the problem be in multiple subifs on FastEthernet and Serial(for frame relay) interfaces. There are 7 subifs on each interface.Is that perhaps too much subinterfaces?

IgorHamzic Mon, 09/10/2007 - 03:29

Here it is. The one here is about 15 minutes newer than the one above.


Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

---------------------------------------------------------------------

* FastEthernet0/0 1 505245 0 0 3530000 534 1283000 504 0

* Serial0/0 0 688 0 2753634 14000 51 1000 2 0

* FastEthernet0/1 35 439826 0 0 2249000 631 1948000 590 0

* Virtual-Access1 0 0 0 0 0 0 0 0 0


The high processor time is almost constant through the day but it doesn't seem to affect network performance at this moment.

Paolo Bevilacqua Mon, 09/10/2007 - 03:50

Hi,


doesn't seem like you have a very high traffic but you are experiencing many input drops. S0/0 appears to be severely congested with only 14 kbps on output. Do you have FR shaping on it ?

This router needs to be looked interactively, something doesn't seem right.

Do you have CEF enabled ?

IgorHamzic Mon, 09/10/2007 - 04:08

Yes ip cef is enabled. The serial interface isn't active as much. It's main purpose is as backup to some remote sites(that we usually have access to over fastethernet sub interfaces) over frame relay and it will be phased out in the near future.

Paolo Bevilacqua Mon, 09/10/2007 - 05:07

Ok, then it's a FE to FE issue. Please monitor if drops are increasing, it can be a buffers issue.

IgorHamzic Mon, 09/10/2007 - 10:41

I have been following the situation for some time with following results. I have entered the show interface summary command and have seen that sometimes when there are many packets in input queue FastEthernet 0/1 that there are dropped packets. Here are the outputs of 3 consecutive show interface summary commands:


*: interface is up

IHQ: pkts in input hold queue IQD: pkts dropped from input queue

OHQ: pkts in output hold queue OQD: pkts dropped from output queue

RXBS: rx rate (bits/sec) RXPS: rx rate (pkts/sec)

TXBS: tx rate (bits/sec) TXPS: tx rate (pkts/sec)

TRTL: throttle count


Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

---------------------------------------------------------------------

* FastEthernet0/0 2 505245 0 0 1140000 153 715000 137 0

* Serial0/0 3 688 0 2753634 14000 51 0 0 0

* FastEthernet0/1 76 471420 0 0 406000 248 2855000 393 0

* Virtual-Access1 0 0 0 0 0 0 0 0 0

NOTE:No separate counters are maintained for subinterfaces

Hence Details of subinterface are not shown


*: interface is up

IHQ: pkts in input hold queue IQD: pkts dropped from input queue

OHQ: pkts in output hold queue OQD: pkts dropped from output queue

RXBS: rx rate (bits/sec) RXPS: rx rate (pkts/sec)

TXBS: tx rate (bits/sec) TXPS: tx rate (pkts/sec)

TRTL: throttle count


Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

---------------------------------------------------------------------

* FastEthernet0/0 1 505245 0 0 1330000 171 708000 145 0

* Serial0/0 0 688 0 2753634 15000 52 0 0 0

* FastEthernet0/1 18 471429 0 0 428000 313 4353000 529 0

* Virtual-Access1 0 0 0 0 0 0 0 0 0

NOTE:No separate counters are maintained for subinterfaces

Hence Details of subinterface are not shown


*: interface is up

IHQ: pkts in input hold queue IQD: pkts dropped from input queue

OHQ: pkts in output hold queue OQD: pkts dropped from output queue

RXBS: rx rate (bits/sec) RXPS: rx rate (pkts/sec)

TXBS: tx rate (bits/sec) TXPS: tx rate (pkts/sec)

TRTL: throttle count


Interface IHQ IQD OHQ OQD RXBS RXPS TXBS TXPS TRTL

---------------------------------------------------------------------

* FastEthernet0/0 0 505245 0 0 1330000 171 708000 145 0

* Serial0/0 1 688 0 2753634 15000 52 0 0 0

* FastEthernet0/1 4 471433 0 0 428000 313 4353000 529 0

* Virtual-Access1 0 0 0 0 0 0 0 0 0

NOTE:No separate counters are maintained for subinterfaces

Hence Details of subinterface are not shown


Any advice if it is a buffers issue? What could I check to confirm it?

IgorHamzic Tue, 09/11/2007 - 05:03

One other thing I noticed. I have been pinging the router at random intervals and noticed that sometime the ping reply is around 300 ms and that drops happen on FE0/1 at those times.

Could that be also caused by queue size?


I'll try to double the input size to 150 with the hold-queue command as the default for the interface is 75 and see if it will resolve anything.

Joseph W. Doherty Tue, 09/11/2007 - 05:26

Possibly the delay in pings is caused by the box being busy doing whatever also is causing the input queue to fill. Routers respond to pings when they get around to it.


Increasing any FIFO queue also adds latency. So we're trying to reduce drops without dramatically increasing latency. Increasing the queue size by 75 would add roughly about 9 or 10 ms, for 1500 byte size packets at 100 Mbps. So, best to try a bit and see what happens.

IgorHamzic Thu, 10/11/2007 - 00:25

Sorry for a very late reply. I did increase the input queue by 75 but I'm still seeing input queue drops. In fact the CPU usage has gone up about 3 percent in this time also with no configuration changes.

And from time to time I also see this error in the logs: %ERROR: delta cannot be less than 0.


I think I may know what the problem is though.The regional offices are generating more and more traffic toward our central site.I'll try to split them over 2 routers and see what happens.


If you have anymore advice I'll try them before splitting the traffic so please post them.

Joseph W. Doherty Thu, 10/11/2007 - 03:20

Even with an increase of the input queue size, I would still expect you'll see drops. The question is whether the percentage of drops decreased.


Going back to the CPU issue, likely caused by frame-relay compression. If you can, disable it and see what happens to your CPU utilization.

Paolo Bevilacqua Thu, 10/11/2007 - 08:25

Can you get a L3 switch to handle FE to FE traffic? There is an 8 port model of the 3560 that have an unsurpassed price / performance ratio.

The router would then be relieved of all issues with FE and have the resources to handle WAN with any feature you want on it.


hope this helps, please rate post if it does!

IgorHamzic Thu, 10/25/2007 - 11:22

I did remove frame-relay compression on the serial sub-interface but didn't get any lower CPU usage. In fact I started seeing drops on the serial interface when I removed it but they stopped when I returned the compression.

I put the remote office doing most traffic on a different router and the rate of the drops decreased very much.


And did you mean I should put a L3 switch in front of the router and let it handle all of the remote offices and then pass on that information to the router?

Paolo Bevilacqua Thu, 10/25/2007 - 11:33

Yes, use a L3 switch for LAN traffic and have the router use only a LAn interface and the serial ones. There are also L3 switches in form of network modules that you can use in the 3745, but a 3560-8 should be much cheaper.

IgorHamzic Thu, 10/25/2007 - 12:05

So in effect I should have something like this:


remote offices---L3 switch---router


And the configuration something like this:


-on L3 switch IP addresses I now have on the subinterfaces of the router which will handle all of the metroethernet traffic to remote offices and the connection to the rest of the network


- on the router just the serial sub intefaces and connection to the L3 switch


Something like that?

IgorHamzic Wed, 11/07/2007 - 12:46

Hi all. After a period of inactivity I have done some further test on the router in question.

I have managed to eliminate the drops on the interface but the problem with high CPU still persists.

Here is the output of the sh proc cpu | exclude 0.00%__0.00%__0.00% command:


CPU utilization for five seconds: 79%/77%; one minute: 82%; five minutes: 81%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

3 44740 2493 17946 0.00% 0.00% 0.39% 162 Virtual Exec

4 33567892 3562379 9422 0.00% 0.09% 0.06% 0 Check heaps

5 131252 448630 292 0.00% 0.06% 0.03% 0 Pool Manager

47 4052790723095525684 0 1.30% 0.47% 0.47% 0 IP Input

73 394495802726608548 0 0.00% 0.03% 0.00% 0 Socket Timers

115 13248812 22562505 587 0.16% 0.04% 0.03% 0 SAA Event Proces

131 14046112 164918057 85 0.08% 0.03% 0.02% 0 IP-EIGRP Hello

138 13506041962726608548 0 0.40% 1.87% 1.90% 0 Rtt Responder

141 2287760 282632683 8 0.08% 0.00% 0.00% 0 fastblk backgrou


I'm losing this battle as I have removed the highest using office from this router, eliminated the drops but there are still times when the processor maxed out due to interrupts as you can see from the show command.

IP CEF is enabled on all fastethernet interfaces and their associated subifs. I'm running out of ideas on this one.

BTW good advice in previous posts.

Paolo Bevilacqua Wed, 11/07/2007 - 13:27

Hi,


what is the traffic volume over all interfaces when you take the show proc cpu ?

Considering that the 3745 is rated for a max of 225 Kpps (see attached), at 80% cpu you could be around 150 Kpps, nothing out of ordinary for sustained LAN to LAN traffic.


Because router performance limitations, the suggestion of using a L3 switch for inter-VLAN routing.



IgorHamzic Wed, 11/07/2007 - 14:00

The PDF was really helpful but even when there is a high CPU usage on the router I can only see about 180 packets/second in input and about 174 packets/packet in output when I hit show interface command. Doesn't seem nowhere close to 150000 packets/second you mentioned.

Paolo Bevilacqua Wed, 11/07/2007 - 14:11

Ok, if you are positive about the low traffic, then is something else.


Unfortunately it is difficult to diagnose what.

I've seen routers spike CPU but most of the time it was due to some rogue traffic that could be "seen" with regular show commands. Please check again the router's counter against the ones of the connected switch. There is a small chances that some high traffic is not being counted by the routers.


Also if possible at all, could you reload the router while it exhibits high CPU? If when it comes back with low CPU, that could point to some kind of strange bug.


Going forward, ultimately you might need to 'span' a port from the switch to a network analyzer like wireshark, to find out what is really going on there.


Thanks again for the nice rating and good luck!

IgorHamzic Wed, 11/07/2007 - 15:46

I have plugged in wireshark on the switch in front of the router and for now have seen a lot of UDP traffic between different routers with source ports above 50000 and destination ports around 14000 and vice versa.

Any idea what these might be?

Paolo Bevilacqua Wed, 11/07/2007 - 16:49

Easily that is some kind of P2P. Any more detail on the packet ? Can you confirm you see source and destination address same as routers, that seems strange.

IgorHamzic Wed, 11/07/2007 - 17:46

Yes it's the addresses of my routers and my on central location and one of my routers on the remote location plus I see the address of the switch on the remote location as the router is configured as a router on a stick with the ISP link first going in the router and then from switch into the router.

It strikes me as really odd that routers and switches should be talking to each other using UDP and such high ports.

About the packet what would you like to know so I can copy paste it from Wireshark?

Joseph W. Doherty Wed, 11/07/2007 - 17:43

Enough traffic will load down the interrupt CPU %. Can you estimate the total traffic flow going through the box?


Otherwise, the delta of 2% looks great between the total CPU and interrupt CPU.

IgorHamzic Wed, 11/07/2007 - 18:02

I think I might have found the guilty party for the problem.

I have found the following on a router:


rtr responder

rtr responder type udpEcho port 14400

rtr responder type udpEcho port 14401

rtr responder type udpEcho port 14402

rtr responder type udpEcho port 14403

rtr 17

type jitter dest-ipaddr x.x.x.x dest-port 14388 num-packets 50

request-data-size 172

frequency 12

hours-of-statistics-kept 25

rtr schedule 17 start-time now life forever

rtr 21

type jitter dest-ipaddr y.y.y.y dest-port 14402 num-packets 3000

request-data-size 172

frequency 70


The UDP ports seem to match as far as I can tell(it's past 3 AM here). I'm not familiar with these commands but as far as I can tell they send UDP packets with specific ports to measure jitter. The ip addresses match the IP addresses of my remote office router and switch. There are similar configurations on the router and switch in the remote office. Could this be the problem behind high cpu usage?

Joseph W. Doherty Wed, 11/07/2007 - 18:28

RTR (or now known as SAA, prior name still in command syntax), will add load; not sure if it would be enough to account for your CPU load especially at the interrupt level. If no one is monitoring or analyzing the stats, something you could turn off and see the impact.

IgorHamzic Wed, 11/07/2007 - 18:55

Seems that disabling the rtr did the trick. The CPU usage dropped visibly after disabling the rtr.

The result of sh proc cpu | exclude 0.00%__0.00%__0.00%:


CPU utilization for five seconds: 1%/0%; one minute: 0%; five minutes: 12%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

3 15220 530 28716 0.00% 0.00% 0.23% 162 Virtual Exec

4 33746820 3566160 9463 0.65% 0.11% 0.06% 0 Check heaps

47 4060059163096394853 0 0.08% 0.01% 0.13% 0 IP Input

115 13374740 22568152 592 0.08% 0.02% 0.01% 0 SAA Event Proces

126 3593591000 179190592 20054 0.16% 0.01% 0.00% 0 FRF9 manager

138 13547657362727439431 0 0.00% 0.00% 0.41% 0 Rtt Responder

149 6738796 89963122 74 0.00% 0.01% 0.00% 0 IP-EIGRP Router


I'll keep monitoring for a while and report the results. If this keeps up I'll disable rtr on the other router and switch too.

Correct Answer
Joseph W. Doherty Wed, 11/07/2007 - 19:15

Quite a difference, especially if it holds!


If no one is actually using RTR/SAA, deactivation might make sense. Or, check the parameters and have the tests sample much, much less frequently.

IgorHamzic Wed, 11/07/2007 - 19:40

Sure is quite a difference.I believe no one is using the RTR/SAA currently but I'll wait for a couple of days and see if someone complains.

I sure hope it holds. The real test will come in about a couple of hours when people in remote offices start work.

I think it will be a much smoother ride as there won't be any interrupts now that will use CPU resources.

ariesc_33 Thu, 11/08/2007 - 21:46

Hi,


Can you post the complete output of the "show proc cpu"?


The output above shows that the overall CPU is 81 percent, but summing up the output you posted, it is not equal to 81 percent.


The output should be a clear clue on what really causes the high CPU by looking of how much CPU of each service consumes.


The first output on your first post shows that one of the problem has something to do with frame relay.







IgorHamzic Tue, 11/13/2007 - 00:43

Hi all. The problem was with RTR. After I removed it from the configuration the CPU usage dropped to around 1% even during peak times.

The problem is solved now.

Thank you guys for the great advice that helped me solve this.

Actions

This Discussion