cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1258
Views
0
Helpful
21
Replies

6509 High CPU

niro
Level 1
Level 1

I'm having a problem with one of our 6509s, the CPU on it constantly goes up to close to 100% for a minute and then drops back down to single digits (it happens every few minutes), it's not causing connectivity issues yet, but I'm worried that it eventually will. I can't really figure out what's going on, I'm pretty sure it's multicast traffic causing the problem though. I've attached a very basic diagram of our network.

Most of the multicast traffic would originate either from Core1 or Core2, although some multicast originates from either of the distribution switches or the access switches. All the interfaces and vlans are set up with sparse-dense mode and we're not using any RP's right now.

The switch that's having the cpu spikes is distribution 1...we're running EIGRP for routing.

Can anybody help me figure what's going on? Here is the output of show proc cpu when the cpu jumps up.

CPU utilization for five seconds: 95%/79%; one minute: 24%; five minutes: 27%

PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process

86 3728919922187837326 0 14.58% 3.07% 4.77% 0 IP Input

170 109925716 998945240 110 0.57% 0.22% 0.23% 0 Port manager per

Thanks for any help!

21 Replies 21

Edison Ortiz
Hall of Fame
Hall of Fame

Just the fact you are running multicast could be a factor for high CPU if not configured properly.

For further troubleshooting we need to see the switch configuration along with the show ip mroute and show version output.

HTH,

__

Edison.

Here is a putty capture with the output you asked for, this is for dist1. I took out some stuff from the config that wouldn't be relevant (like SNMP).

You have a lot of multicast packets being switched in software, not a good thing.

Please post the output from typing show mls ip multicast

__

Edison.

Here it is...

I highly recommend you configure a RP in your environment. From the show ip mroute it shows a concerning number of multicast groups being sofware switched.

Any group from that output without RPF-MFD, it's software switched which punts the process to the CPU.

With a RP design, the switch don't need to populate unwanted multicast groups on their table unless it has been requested by the clients they are servicing.

HTH,

__

Edison.

Is the dist1 switch only affected?

from your dist1 switch, which switch in your diagram is your layer 3 port-channel4 on the dist1 switch connected to since your dist1 is getting multicast traffic for different multicast groups on that port-channel. what you might want to do is limit the number of multicast or broadcast on that port-channel4.

Have you tried to use the storm-control broadcast level 70 command on the port-channel4 on the cat to limit increasing broadcasts/multicast?

That command will monitor broadcasts in one second intervals and once broadcast/multicast traffic load goes beyond 70% w/in that window the subsequent packets for the remainder of the 1s window will be dropped.

I would think this could help avoid excessive CPU. also just beware that the 6500 process broadcast in software that always affect the cpu unless you are usig catos.

Franco.

Thanks ediortiz and francisco.

Po4 is the etherchannel to Core1...most multicast traffic would originate from core1 or core2 so it makes sense that alot of it is coming from Po4. Actually Dist1 is not the only switch affected, Core1 also has this problem but much less frequently.

My concern with storm-control is that this multicast traffic is very business critical, and a large burst resulting in dropped packets could be a big problem.

I have tried setting Core1 and Core2 to be RPs in the past, however for some reason when I do that the switch seems to flood some ports with multicast traffic on core1 even though they did not subscribe to it...the traffic exceeds 1GB/sec so some some of our servers (that do not subscribe to any multicast) had 1GB/sec of multicast hitting the port, making all the other apps on those servers drop. I probably have an error in that RP config, can you help me figure out what the problem with this config is?:

Core1:

ip pim send-rp-announce Loopback0 scope 15 group-list 21

ip pim send-rp-discovery scope 15

access-list 21 permit 231.0.0.0 0.255.255.255

access-list 21 permit 230.0.0.0 0.255.255.255

interface Loopback0

ip address 10.20.20.1 255.255.255.255

ip pim sparse-dense-mode

end

Core2:

ip pim send-rp-announce Loopback0 scope 15 group-list 21

access-list 21 permit 231.0.0.0 0.255.255.255

access-list 21 permit 230.0.0.0 0.255.255.255

interface Loopback0

ip address 10.20.20.2 255.255.255.255

ip pim sparse-dense-mode

end

Thanks for all your help!

Since you only have 2 switches running multicast, go with the KISS principle. Configure Static RP instead of Auto-RP.

I also recommend Anycast RP instead of having different RPs.

http://www.cisco.com/en/US/docs/ios/12_4/ip_mcast/configuration/guide/mcbbasic.html#wp1145000

HTH,

__

Edison.

There are actually more then just those two switches participating in multicast...just about all the switches in our network need to listen get multicast data (the diagram I provided is just a very basic diagram with the most important components for this issue included).

I can probably still configure static RP instead of autoRP...let me see if I understand this right.

I have to run the these commands on every vlan and L3 interface that needs multicast data:

ip pim rp-address rp-address [access-list]

I have to switch every interface from sparse-dense mode to just sparse-mode

And I have to configure both core Lo0 interface to the same IP address with a /32 mask?

I still would prefer to use AutoRP as it would require alot less work...any reason why the config I provided ends up flooding ports with unwanted multicast traffic?

Thanks!

Going with Auto-RP is fine if you have a lot of switches and other L3 devices participating in your multicast domain.

I believe your config is fine up to the point where you have ip pim sparse-dense mode on the interfaces.

It will use sparse mode when the group has being mapped to a RP and dense when the RP isn't announcing such group. You have an ACL for the announcement so any multicast group not included on this announcement will run on dense mode, therefore you aren't gaining much on the RP configuration.

I recommend configuring sparse mode in all interfaces along with the config your provided and see if the behavior your experience before returns.

Just a note, you need the ip pim auto-rp listener in order to receive the .39 and .40 dense groups in all multicast devices.

HTH,

__

Edison.

The only multicast traffic on our network is in the 230.x.x.x and 231.x.x.x range...when this problem happened (the multicast flood on all interfaces), I ran a packet capture on one and I was receiving alot of 230.x.x.x packets even though that machine never subscribes to any multicast groups....and that address was running in sparse mode (since I had the acl there).

Right now, I'm running that config without the 230.x.x.x in the acl (only 231.x.x.x), and I DO get the 231.x.x.x packets on flooded to interfaces that don't need it, but the 230.x.x.x group is working fine (except the software routing problem which is causing high cpu utilization). There is very very little 231.x.x.x traffic so it's not really causing a problem that it's flooding all the ports (it's a test group right now).

I can try the sparse mode only on all interfaces..I'm just worried that come production hours, multicast packets will flood all the ports and it will take me longer to push out the old configs back to all the switches. I guess I can try to find a way to simulate alot of multicast traffic so I can test it off hours.

It's kind of late, I'm basically typing in my sleep so hopefully this post made sense. :)

make sure any layer 2 access switch receiving multicast are setup with IGMP snooping to help with multicast flood on all interfaces.

IGMP snooping can very effectively reduce multicast traffic from streaming and other bandwidth intensive IP applications. While a switch that does not understand multicast will broadcast multicast traffic to all the ports in a broadcast domain (a LAN), a switch using IGMP snooping will only forward multicast traffic to the hosts interested in that traffic. This reduction of multicast traffic reduces the packet processing at the switch (at the cost of needing additional memory to handle the multicast tables) and also reduces the workload at the end hosts since their network cards (or operating system) will not have to receive and filter all the multicast traffic generated in the network.

also on your vlans i think vlan 8 you have pim sparse-dense mode. like ediortiz said, you should use RP to use sparse-mode. much better to use instead of dense-mode.

also what you might want to do is to have ACL on your vlan 8 to limit the number of multicast address groups your switches are routing. what you can do it use ACL on the vlan8 for example and only accept multicast traffic on the addresses you use 230.x.x.x and 231.x.x.x range.

Good luck

Franco

Yea the switch that was flooding the ports with multicast is a 6509 and has IGMP Snooping already enabled...I don't know why it did that. I'm going to try again this weekend and see if I can figure out what was going on, I'll let you know if it happens again.

Thanks for your help!

Ok so here is what I did to test:

I set up RP's as described above for the 231.x.x.x group, I left the interfaces in sparse-dense mode so any other group would use dense mode. I then ran a test, listening to multicast traffic on group 231.1.1.1 on the access switch plugged into the distribution switches, and publishing to the same group from a server plugged into Core 2.

As soon as I start the test, the CPU on Core 1 hits 100% and stays there till I end the test. I captured packets that were hitting the CPU and they were all multicast packets on the 231.1.1.1 group. I verified the group was in sparse mode and there were no 231.x.x.x groups in dense mode. I run the same test on another group in dense mode and I do not get that CPU problem.

Any ideas why those packets hit the CPU like that?

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco