High CPU utilization on Cisco 4500 catalyst switches when using NLB

Unanswered Question
Dec 3rd, 2007

hello

we're having problems with Microsoft NLB causing high cpu usage (peaks of 95%) on catalyst 4500s (IOS Version 12.2(31)SGA1, RELEASE SOFTWARE (fc3)). please see attached powerpoint slide for network diagram and configs (ip addresses aren't the real ones).

the 2003 servers are configured for IGMP multicast.

i've followed the http://www.cisco.com/warp/public/473/cat4500_high_cpu.pdf document to troubleshoot.

Found:

• K2CpuMan Review was taking the most cpu

• CPU queue L3 Fwd Low was getting a high volume of traffic so the document recommended a cpu monitor session

CPU monitor session showed:

• High volume of traffic between NLB server (1.1.1.13) and proxy (1.1.1.20)

• High volume of Client (both external/internal) traffic to proxy

Tried

switchport block multicast

switchport block unicast

on VLAN 222 but this made no difference to 4500 cpu

Tried

enabling cgmp on vlan 222 on 6500 but it's cpu jumped to 99% (from 1%) - the 4500s also went to 99% - cgmp switched off on 6500 vlan 222

the traffic we're seeing in the capture isn't unexpected but what can we do stop the high cpu on the 4500s.

thanks

andy

Attachment: 
I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
andrewswanson Tue, 12/11/2007 - 07:19

thanks for the feedback. the 4500 cpu queue causing the problem is L3 Fwd Low. the documentation states that:-

"If the ARP is unresolved for the destination IP address, packets are sent to this queue."

i have static arp entries for the destination ip address (the web server) on all the L3 switches but still getting very high cpu usage because of hits on the L3 Fwd Low queue.

cheers

andy

andrewswanson Tue, 06/03/2008 - 00:54

Hi - couldn't find a solution for this. MS NLB worked fine when all load balanced servers were on the same site/campus - as soon as these servers were geo-dispersed we started getting the high cpu problems. we're currently looking at a hardware solution with either Cisco ACE or F5 BigIP.

cheers

andy

Ryan Carretta Tue, 06/03/2008 - 01:05

This sounds like a reasonably simple case of process switching. The key here will be to find out why the packets are getting punted to the CPU rather than forwarded in hardware. Is there a route to the destination? An ACL that is logging the packets? Are the packets being sent with IP options? You mentioned you have static routes - do they point to an interface or to an IP address. If they point to an IP address, can you resolve ARP for that address? Are there TTL failures or ICMP redirects being sent out the interface (try a SPAN session to find out).

Actions

This Discussion