One of our routers, a 3825 running IOS 12.4(22)T1 has been reporting non-process (i.e. interrupt-related) CPU utilization in the 60%-80% range, with peaks even higher. We are not actually experiencing any performance issues, and I have followed the instructions in the Cisco document entitled "Troubleshooting High CPU Utilization Due to Interrupts".
(Doc. ID: 41120)
The document is not clear as to how the output of the "show profile terse" command should be used, so I may be missing something useful.
Basically, CEF seems to be working correctly and "show align" reveals: "No alignment data has been recorded. No spurious memory references have been recorded."
My core question is:
"How can I identify the precise cause of high CPU utilization due to interrupts on a 3825?"
The router is acting as a DMVPN headend.
Given this fact, would installation of a VPN AIM (AIM-VPN/SSL-3) be in any way beneficial?
I assume your overall data rate is far under what we would expect for an 3825.
I'm wondering whether you might have packet fragmentation CPU processing, but with such a recent IOS, it might be counted under interrupt CPU. Are you also configued for PMTU and tcp-mss-adjust?
We are, in fact, using "ip tcp adjust-mss 1360" on the DMVPN tunnel interfaces (two of them), as per Cisco recommendation. We aren't using path MTU discovery, but rather have a fixed "ip mtu 1400" on these interfaces, as well. As for the other interfaces, we have a gigabit ethernet interface, with no MTU specified, and a serial (DS3) interface with no MTU specified either.
Assuming packet fragmentation is involved, although the Cisco recommendations address most situations, there are still issues. First, a TCP header can use more than 40 bytes (i.e. the adjust-mss statement would need to be set even smaller - this is uncommon, though), and/or there might be max sized non-TCP traffic that relies on PMTU to work correctly (even if it does, many short lived max sized flows can be a performance issue). Again, don't know this could be the issue, just a couple of points to be aware of. (You might want to sniff your traffic to know what's being passed through the router. BTW, later 12.4T IOS versions support router sniffing.)
With your DMVPN topology, is it strictly hub-and-spoke or do you allow spoke-to-spoke tunnels, and is the 3825 the hub? (BTW, I see you note the router is the head-end, but just wanted to confirm topology.) Reason I ask, believe Cisco notes many spoke-to-spoke tunnel set ups can impact performance. (Don't know, though, whether it would be in the fast path or not.)
As to your original question about whether the optional VPN module might improve performance; it's possible, although I recall the on-board modules do a fairly good job. I recall seeing a paper that documented the difference between the two, for the various ISRs and optional modules, but couldn't find it. (Again, don't know for sure, but I would suspect encryption off-load wouldn't be counted against the main CPU.)
Unfortunately, don't have any other ideas, beyond if you have a support contract you might raise the issue with TAC.
The DMVPN is fully meshed spoke-to-spoke.
I see no evidence of fragmentation.
Although I have been sniffing on the Ethernet interfaces, I have not had an opportunity to sniff the traffic on the serial interface (T3 controller). I'll have to take a look at Embedded Packet Capture for this.
I thought it was always handled in hardware on th 3825:
#sh crypto eng conf
crypto engine name: Virtual Private Network (VPN) Module
crypto engine type: hardware
Location: onboard 0
Product Name: Onboard-VPN
FW Version: 01100200
Time running: 4294967 seconds
3 DES: Yes
AES CBC: Yes (128,192,256)
AES CNTR: No
Maximum buffer length: 4096
Maximum DH index: 0500
Maximum SA index: 0500
Maximum Flow index: 1000
Maximum RSA key size: 2048
crypto lib version: 20.0.0
crypto engine in slot: 0
platform: VPN hardware accelerator
crypto lib version: 20.0.0
"I thought it was always handled in hardware on th 3825: "
I believe that's true by default, but recall it might be possible to force encryption via software (config command) and/or if there's a crypto hardware error it might fall back to software encryption.
Also recall these a show command to confirm what's doing the encryption (software or hardware), but it's been a while since I've used crypto stuff.
where you ever able to figure out the problem with your high cpu, i have a similar config and seeing high cpu due to interrupts.
A further question:
Many of the DMVPN tunnels emanating from the router are in tunnel mode. Some of them are in transit mode, as per "Cisco IOS Security Configuration Guide: Secure Connectivity" (http://www.cisco.com/en/US/docs/ios/sec_secure_connectivity/configuration/guide/sec_DMVPN_ps6441_TSD_Products_Configuration_Guide_Chapter.html), which states, quite accurately that "for the NAT-Transparency Aware enhancement to work, you must use IPsec transport mode".
I am quite aware of the packaging differences between tunnel and transit mode, but my question is this: Would switching the existing tunnel mode tunnels to transit mode reduce CPU utilization due to interrupts?
I tried switching the tunnels to transport mode and it made no discernable difference to CPU utilization.
You have probably solved your problem alreday, but as far as I know, the utilization of 60 - 80% on the interrupt level is not a problem at all. The magic is that CPU utilization is not linear over the range of ingress traffic rate. For example, on our 1841 ISR is the maximum IPsec throughput 40 Mbps. Flow of 20 Mbps causes CPU utilization of 80% on interrupt level. But it does not mean 40 Mbps flow causes 160% utilization It is about 99%.
One more note: With IPsec hardware accelerator is the CPU utilized almost exclusively by the interrupts. Without hardware accelerator (encryption is done by main CPU) you never get high value of interrupt utilization, as the CPU is utilized mostly by the encryption process itself (high utilization on the process level) what results into significantly lower throughput also, of course.
Hello, this issue is likely already resolved but I am posting this in the hopes that it may help others in the future with this issue.
The first thing you want to look at when you have high CPU at interrupt level is how much traffic the router is passing. If you do a show interface and add up either all of the ingress, or all of the egress, that will give you a rough idea of the total amount of traffic.
Then you want to see if the amount of traffic that your router is passing is relatively high for what the platform is capable of. You can reference your device using this document:
Please note these numbers are ideal scenarios for routers, with zero features enabled. This is the best case scenario. The goal then is to maximize the efficiency so you can get as much out of your router as possible. Different features will add CPU overhead for the same amount of traffic. A realistic goal, with different features applied, is 60 to 80% of the best case scenario. You also need to extrapolate out your current bandwidth to CPU scaling so you would know what your CPU would be passing if your CPU was at 99%.
Your 3825 is at 85% CPU while passing 107Mbps of traffic. You need to know what your router would be passing at ~99% in order to know how efficiently your router is passing traffic.
x = (107Mbps * 100) / 85
x ~= 126Mbps
The 3825 has a theoretical maximum of ~179Mbps.
Therefore your efficiency is:
y = (126Mbps / 179Mbps) x 100
y = 70.4%
This is in the ballpark of what one would expect from a real world router with features enabled. If you work out these results and your efficiency is something like 30%, -then- we really need to worry about the other things noted in the document referenced before, such as a software bug. (The direct link to that follows.)
Please also note that if you believe that your efficiency is very low that is usually when you'd want to do CPU profiling. You can do CPU profiling by yourself, but the outputs will be effectively useless because you need Cisco proprietary tools to decode the outputs. Therefore opening a case with Cisco TAC would be needed at this stage.
I hope this helps some.