I have a large network comprised of 2950s at the access Layer, 3548XLs at the Distribution and 6500s at the Core. The 6500s are the VTP Servers and the rest of the switches are VTP clients. We are running pim sparse-mode at the Core. The problem I found today is that the 2950s are being flooded with IGMP/CGMP group messages for all VLANs. This traffic is overwhelming the switch. I think if I could prune the VLANs that IGMP/CGMP need to listen too performance should go back to normal. I have attached configs for the Core, Distribution and one of the 2950s. As well as a debug of the IGMP snooping command on the 2950. Any help would be appreciated. Thanks!
I honestly don't see that VTP is relevant to this problem. VTP is normally quite lightweight, has relatively infrequent advertisments, and is only propagated on trunks anyway.
It might be the CGMP that is killing it. CGMP has to be flooded to all ports on the VLAN, times the number of clients that are sending in IGMPs, times the number of groups. Not very scaleable.
I wish I could get rid of the CGMP on my LAN, but I still have CatOS 4000s, and they do not support IGMP snooping. Your 2950s support IGMP snooping, but your 3548XLs do not. It's unfortunate you have them at the distribution layer.
However, even so I would not expect this sort of traffic to overwhelm the switch, so I suspect there is something else going on. I have a VLAN of over 1000 hosts on the same VLAN, and I see a CGMP background traffic of about 7 packets per second, but the switches take it in their stride.
Thanks for the reply Kevin. I should have been a little more specific in my post. Since I'm running VTP, the switch is sees close to 100 VLANs. Even though most of those are pruned at the Core, it is still running IGMP snooping on all of the VLANs. When I did a debug ip igmp snooping it brought the switch to it's knees. I had to disconnect it from the backbone and wait a couple of minutes before it would accept the undebug command. I have seen upwards of around 125+ packets/sec background IGMP traffic. Maybe I have a misconfigration on the Core. I do have a TAC case open on this but am still looking for any one else that might have had this problem. Also, my 3548xls actually seem to be working fine, they are not affected by whatever problem is happening on the 2950s.
OK, I see what you mean now.
If these 2950s are at the edge, if I were you I would manually remove from the uplink trunks any VLANs that are not needed by that particular switch. (I would also do that at the downlink on the distribution layer.)
The access switch would still know the existance of all the VLANs, but would no longer be tempted to run a Spanning Tree for those that it does not need. You could then dispense with all those dangerous-looking no spanning-tree commands, because the switch would only run a Spanning-Tree instance for those VLANs it actually had an interest in.
I was going to say that maybe you would get the same effect with automatic VTP pruning, but I would have been wrong; automatic VTP pruning does not affect the Spanning-Tree topology.
Of course, the object of the exercise is to cut down on the unnecessary IGMP traffic. At lest you would only be getting the IGMPs for those VLANs the switch is actually interested in.
I went ahead and pruned the VLANs on the uplink and downlinks. I had also done the no spanning-tree for the VLANs not in use.
The switches are still getting hammered with multicast traffic (on the gi0/1 interface). It still looks like CGMP/IGMP leaves and joins. I have attached a sniffer trace and some show commands. I had to export the trace to a txt file to attach it here, you could import it in Wireshark and it should look fine.
Problem has been solved. After working with TAC, they found that the 2950s are running IGMP snooping and the 3548Xls are running CGMP. After disabling igmp snooping on the 2950s the network stabilized. Here is the official work from TAC "Your access layer switches (2950) are connected to XL switch which is EOL switch and can only understand cgmp while your access and core layer switches can understand igmp. There is communication issue between your distribution and access layer switches as XL switch can only understand cgmp while 2950's can understand igmp. CGMP messages comes to 2950s and sent to cpu as hardware doesn't understand them and later cpu drops 'em.
I disabled the igmp snooping on the 2950 and we noticed that via extended pings that there was no packet loss from your core to access layer. Please try to remove this XL switch from your network as this is the source of your network performance degradation. Hope this helps and let me know if you have any more questions."
This was not happening before we upgraded or XLs and 2950s to the latest IOS release a couple of weeks ago. My guess it there is an IOS bug in the XLs but they are probably not going to look into it since they are EOL. Thanks for all the input.
That's interesting. Sure, the 3548XL will pass all the CGMP down to the access layer. CGMP gets flooded to all ports anyway. And sure the CPU in the 2950 drops them because it does not understand them. But I still don't really see why that should kill the CPU.
As for the IGMP, I don't really see why that should be a problem either. Sure the 3548XL does not snoop the IGMP, but it should pass it through transparently. After all, IGMP is not really intended for the switch at all; it is a protocol between receiver (host) and (multicast) router. Some switches choose to snoop it so they can limit the scope of multicast streams based on the IGMPs.
And for disabling the IGMP snooping on the 2950, what does that do? Well, instead of using the IGMP to control where the multicast streams go, it floods the multicasts everywhere. It's a bit like turning the switch into a hub as far as multicast is concerned.
Unless ... unless the 2950 implements its IGMP snooping in software. If that is the case, then IGMP snooping will kill the CPU because not only does the CPU have to process the IGMPs, but also each packet of the mnulticast stream itself too (just in case it happens to be an IGMP). But I thought the 2950 had a hardware assist for IGMP snooping. Maybe it doesn't.
I agree with you. Even though I'm not 100% satisifed with their answer to turn off igmp snooping, the other alternative was to replace the 3548xls with 3750s or 6500s. I don't quite have the magic or the funds to do that ASAP.
They said it was a bad design, well Cisco designed that for us a few years back.
According to TAC, the 2950s have always treated multicast like broadcast in our environment since the 3548xls are running CGMP and not IGMP. I thought the communication was transperent too. Guess I'll find out once we have to re-image a bunch of labs (I work at a college) at the end of the semester.
Guess I was hoping for a little more after the 100k+ Smartnet we pay every year :)
FWIW, I don't see that your 2950s have always treated multicast like broadcast. I think IGMP snooping was working all along in the 2950. As I said, IGMP is strctly between the host and the router. The switch only observes. If the 3548XL were intervening, the multicast would not work at all.
Instead, I think the IGMP snoopig was simply too much for the 2950. If you do IGMP snooping in software, the CPU has to examine every multicast frame, not just the IGMPs. That is because it does not know whether it is an IGMP or not, until it has examined it.
A switch that handles IGMP snooping properly would sort this all out in hardware. The ASIC would decide whether the multicast packet was an IGMP, and would interrupt the CPU only for genuine IGMPs. Or ... it might not even need to interrupt the CPU at all ... it might do the whole lot in hardware.
So my diagnosis would have been different. I would have said that it is your 2950 that nees upgrading, not your 3548XLs. (Although upgrading both would not come amiss.)
What version of the IOS is the 2950 running?
That would make sense, so maybe a bug in the 2950 code? I didn't see any listed in the Bug Toolkit. We recently upgraded our 2950s from 12.1-22.EA6 to 12.1-22.EA11
Guess I could always downgrade to a different release? We were not having this problem until after upgrading the IOS on our 2950s, 3548xls and 6500s. (Access->Distribution->Core)
It might be interesting to backgrade just one 2950 and see if that can take the heat. If it can, you would have a good case to put before the TAC.