I have a pair of 6509 with msfcs in them. They are setup in an HSRP situation with one being active for all vlans and the other being backup.
We have a monitoring server that pings each interface on the msfc. Lately I've been getting random failures on these interfaces. The vast majority of failures are on the primary 6509. It only fails once and then it's okay.
The primary msfc is running at about 33% cpu utilization. With peak traffic on the switch backplane showing at 16%. If I ping any interface on the primary msfc I'm getting response times between .500ms and 7.0ms with anywhere from 4 to 7 percent packet loss. This is consistant across all of the interfaces. Pinging any interface on the backup msfc gives me consistant .500ms response with 0% packet loss. These two switches are connected to each other via redundant gigabit links.
I notice that when I telnet directly to the primary msfc, I get noticably slower response on the console than if I telnet to the switch first and then connect to the msfc via session 15. Is this console performance disparity something that might be indicative of a bottleneck somewhere? It doesn't matter where I source the ping from...even from the back up mfsc I get the same ping performance from the primary (meaning bad).
The cpu load and backplane traffic levels wouldn't seem to indicate a cpu or memory bottleneck but the fact that all interfaces show the same crummy response and reliability as well as the slow console make me think it's something system wide that's getting overloaded. I just can't see what it is.
Are you losing packets and the connection is slow only when you try to get to the addresses configured on the switch (or) is this affecting the user traffic that is passing through the switch?
If all traffic is affected, it might sound funny, have you tried rebooting the swtich since the problem started happening?
Actually, if you have a 2nd Sup/MSFC card in the switch just failover to the standby Sup and check the outcome.
The symptoms you mentioned above doesn't indicate any sort of resource utilization or network traffic problem.
Pls. rate the post if it helped.
I only seem to be losing echo requests when they are directed at an interface on the primary msfc. If I ping any other device I get solid replies with no drops. The path to these devices is through the primary msfc so user traffic appears to be okay. My monitoring software also pings the servers in the data center and I'm not seeing the random timeouts on any of the servers. Just the router.
33% is somewhat high for the msfc . You must have a lot of traffic that needs to be processed switched for some reason. Most traffic should be hardware switched and never hit the cpu . Might do you well to get a sniffer on there and see what the traffic is doing . Possible client infections somewhere ?
Please post a list of processes sorted by the CPU usage, show int + show int stats on the monitored interface, OS version running on the SW
Checking the output indeed seems that CPU is high due to IP INPUT, which means that you have a large portion of your traffic hitting the cpu, what I think we can see on the "show int stat" too, and that is not a good thing.
btw, pings destinated to the router it self will be processed in CPU.
This high IP INPUT could indicate:
Interrupt switching is disabled on an interface (or interfaces) that has (have) a lot of traffic
Fast switching on the same interface is disabled
Fast switching on an interface providing policy routing is disabled
Traffic that cannot be interrupt-switched arrives:
which is caused by a big list of things:
Packets for which there is no entry yet in the switching cache.
IP packets with options
Packets that require protocol translation
Multilink Point-to-Point Protocol (supported in Cisco Express Forwarding switching)
Packets destined for the router
So, I think we will need to identify why you are getting all this traffic process switched.
BTW, Just as an advice, if you IOS permits try:
show proc cpu | e 0.00
show proc cpu sorted | e 0.00
to get an shorter output.
Also show interfaces switching is a good command to troubleshoot this issues.
Hello if you open the first attachment with WORDpad, it will open with no problems (and clear to see).
Could you provide a sample configuration for 1 of your vlans and physical interfaces?
Vlan 11 seems to be the one putting the load on the CPU , if you look at the show int stat and also the show interface it shows a lot hitting the switch processor and also you even have packets in the input queue which means it is backing up somewhat . I think I would sniff vlan 11 and see what is going on . It is pretty amazing we recently had something like this and we finally traced it down to one person ghosting some stations using broadcast mode and this one person was enough to bury a Sup 720 at between 90 -100 % cpu and basically you couldn't do anything with the box when he did this .