Cat6500 - HSRP Failing / High CPU Load

cisco_lite · ‎12-31-2008

The 'show processes cpu' is gone upto 98% on utilization on both Cat switches in redundancy.

The process consuming highest CPU is 'IP Input'.

Is there any way I can identify the host that is causing broadcast.

show cdp nei shows other switches only.

Please assist.

MATTHEW BECK · ‎12-31-2008

I wouldn't expect broadcast traffic to do that to a 6509 CPU. Multicast, perhaps? Is there a multicast stream with a TTL of 1 expiring on the router? Something is getting process switched instead of hardware switched. Has anything changed in the config recently? You can start here:

http://www.cisco.com/en/US/products/hw/switches/ps708/products_tech_note09186a00804916e0.shtml

for more help.

Good luck!

Matt

arohyans · ‎12-31-2008

One quick way to find out might be to enable storm-control on the host ports (I'm not sure what the CATOS equivalent is, but here's IOS). Be aware that this will be service impacting when the offending port is shutdown. Also, you will need to manually reenable it, unless globally you define errdisable recovery for storm-control:

storm-control broadcast level 10

storm-control multicast level 15

storm-control unicast level 10

storm-control action shutdown

HTH,

Aaron

glen.grant · ‎12-31-2008

Look at all your SVI's and you will find one or more much higher than normal . These normally should be very low because most traffic is switched in hardware and should not even hit the cpu . Something is causing a large amount of traffic to be sent to the cpu . If you find one svi much higher than the rest you can turn on netflow for the svi and you can probably get a pretty good idea of whoever is sending the traffic. You can also look at the counters on that SVI and see what traffic is really driving it . Someone doing a single multicast on a even a 720 can bring the CPU to its knees if its not configured for multicast. HSRP will fail and start to flap when the cpu gets that high .

mwong · ‎12-31-2008

Hi there,

Do you have a syslog server that has been shutdown for maintenance or failed?

I have run into the similar problem before; our syslog server failed, it could not accept syslog messages and the syslog messages were bounced back to CPU for processing.

I hope this help

cisco_lite · ‎12-31-2008

Glen,

Could you please mention how can I check the traffic on svi stated in your post.

Thanks.

cisco_lite · ‎01-01-2009

Ok. I have noticed that when I shutdown the trunk (4 Gigabit ethernet links in etherchannel) between the two Cat6500 switches, the CPU utilization goes down to 0% from 99%.

Does this give any clue ?

Thanks.

cisco_lite · ‎01-01-2009

I ran debug ip packet detail (buffered) and found excessive multicast flooding such as below. Every msec I can see 10 of these.

Jan 1 09:13:45.918: IP: s=10.50.50.2 (Vlan12), d=224.0.0.2, len 48, rcvd 0

*Jan 1 09:13:45.918: UDP src=1985, dst=1985

*Jan 1 09:13:45.918: IP: s=10.50.50.3 (Vlan12), d=224.0.0.2, len 48, rcvd 0

*Jan 1 09:13:45.918: UDP src=1985, dst=1985

10.50.50.2 & 10.50.50.3 are both the SVI IPs (HSRP) of Vlan12 on the two Cat6500 switches.

Could you please tell me why is an SVI generating the multicast. Could some of the hosts have joined the multicast session. If so, why did they do so and how could I identify those hosts. In our network, multicasting is not an application requirement. How can I check it and minimize it ?

Please assist. Thanks.

Giuseppe Larosa · ‎01-01-2009

Hello Cisco_lite,

don't worry about these messages they are just HSRP hellos

UDP port 1985 destination: 224.0.0.2 all routers in subnet both routers send hellos messages every 3 seconds with default timers

your issue is a brigding loop see my other post

turn off all debugging and inspect the log messages for some event caused by UDLD or STP

Hope to help

Giuseppe

Giuseppe Larosa · ‎01-01-2009

Hello,

this means you are experiencing a bridging loop.

For at least one L2 vlan.

The problem can be also on the uplink of one access layer switch not only on the etherchannel between the two devices.

Look in the cat6500 log messages if there are messages from UDLD or STP messages about inconsistent ports.

To be noted that we have seen the following:

if a new vlan has to be added and instead of adding it in the port-channel interface the new vlan is added in the configuration of one member link a bridging loop is formed.

This happened twice in two different campus networks.

Other cases were caused on uplinks of one access layer switch: for example in one case UDLD tried unsuccessfully to torn down a port.

We later changed the GBIC.

Hope to help

Giuseppe

cisco_lite · ‎01-01-2009

I have removed all the uplinks/connected switches. Now I have only two Cat6500 connected to each other via etherchannel.

I was actually doing new vlan configuration when this problem happened. I have deleted the vlan from both switches and it is not allocated to any of the service modules anymore.

The port channel interface has all the vlans as 'allowed' by default.

After undoing debug, clearing the log, I don't see any messages in the log. logging buffered has been enabled.

Please assist.

Thanks.

Giuseppe Larosa · ‎01-01-2009

Hello Cisco_Lite,

you have the two C6500 with etherchannel enabled between them allowing all possible vlans.

The next step I suggest you is:

choice one C6500: re-enable only the uplinks of this device to all the access-layer switches.

This shouldn't cause any problem until one uplink is up on each access-layer switch.

sh proc cpu hist

sh log

try to see if there is one device with some messages related to UDLD or STP events.

if you find something meaningful you have found the problem.

An alternate way can be:

re-enable the second uplink on the second C6500 one per time then wait two minutes and look at cpu usage or log messages.

This second method expose you to a chance of making the loop to happen again but this time you can find out the troubled link.

clearing the log you have lost the previous messages if you have a syslog server you can look at the messages there.

Hope to help

Giuseppe

cisco_lite · ‎01-01-2009

Hello Giuseppe,

Just want to correct one thing. There wasn't any uplink to a switch before. It was the ASA firewall. ASA Inside is connected to the Cat6500. Didn't find anything unusual in the ASA log though.

Since both redundant Cat6500 are not connected to any other network device now except servers on ethernet module, there is no need I think to connect any uplinks etc for troubleshooting. The Cat6500 with trunk is given high CPU and without trunk is functioning well.

Is there any debug command for bridging loop. Strange that I don't see any thing in the logs.

Please assist.

Thanks.

cisco_lite · ‎01-01-2009

Hi Guiseppe,

Do you find any abnormal values in the SVI interface output below. At the moment, I have brought down all the SVI's except Vlan12 on both Cat6500.

Vlan12 is up, line protocol is up

Hardware is EtherSVI, address is 0023.3457.0e00 (bia 0023.3457.0e00)

Internet address is 10.50.50.2/24

MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,

reliability 255/255, txload 1/255, rxload 1/255

Encapsulation ARPA, loopback not set

Keepalive not supported

ARP type: ARPA, ARP Timeout 04:00:00

Last input 00:00:00, output 00:00:01, output hang never

Last clearing of "show interface" counters never

Input queue: 1086/75/1702325563/4707 (size/max/drops/flushes); Total output dr

ops: 0

Queueing strategy: fifo

Output queue: 0/40 (size/max)

5 minute input rate 4233000 bits/sec, 8035 packets/sec

5 minute output rate 0 bits/sec, 0 packets/sec

L2 Switched: ucast: 134646434 pkt, 12547593600 bytes - mcast: 22281592735 pkt,

1729637921319 bytes

L3 in Switched: ucast: 17598303 pkt, 9025600495 bytes - mcast: 0 pkt, 0 bytes

mcast

L3 out Switched: ucast: 18650247 pkt, 13911482101 bytes mcast: 0 pkt, 0 bytes

522294062 packets input, 42388663968 bytes, 0 no buffer

Received 504675200 broadcasts (12 IP multicasts)

0 runts, 0 giants, 1113898 throttles

0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored

19203008 packets output, 13873692330 bytes, 0 underruns

0 output errors, 0 interface resets

0 output buffer failures, 0 output buffers swapped out

Giuseppe Larosa · ‎01-01-2009

Hello Cisco_lite,

this SVI is receiving

5 minute input rate 4233000 bits/sec, 8035 packets/sec

Input queue: 1086/75/1702325563/4707 (size/max/drops/flushes); Total output dr

ops: 0

you had a lot of drops in the input queue and what is more serious the actual queue size id 1086 when when max size is 75

most of rx traffic is broadcast:

522294062 packets input, 42388663968 bytes, 0 no buffer

Received 504675200 broadcasts (12 IP multicasts)

0 runts, 0 giants, 1113898 throttles

With the last details you have given you are facing a very high volume of broadcast traffic.

this traffic as explained by other collegues hit the cpu causing high cpu usage.

I would try with ip accounting to find out the source of traffic.

the broadcast traffic can also be the result of a loop

check STP in vlan 12 on both devices with

show spanning-tree 12 detail

verify that both devices agree on root bridge ID and that one of them is the root

forget debug about stp they are very heavy.

Hope to help

Giuseppe