4500 QoS troubleshooting

Unanswered Question
Nov 6th, 2009

Hi,

Can anybody provide any tips on troubleshooting bandwidth related performance on a Cat 4500.

I have implemented QoS for a customer on a campus network with the core devices hanging off of a Cat 4507R with Sup V cards. However, the customer has experienced numerous occasions when traffic for some applications has showed signs of a significant reduction in bandwidth. Possibly it's down to the QoS .. but I need to prove otherwise.

To troubleshoot this I have been looking at all the key interfaces and checking the 4 Qs to see if there are drops.

There are no packet drops occuring so it's proving difficult to blame this on QoS.

The core 4500 switches are running 12.2(40)SG and 12.2(25)EWA10.

Apart from show interface x/y counter detail does anybody know any other counters I should be checking for dropped packets?

Are there any known bugs that imply packets are dropped without them being counted, I haven't found any?

The traffic in question is traversing GRE tunnels at different points in the infrastructure so could this be relevant?

Some of the cards that are receiving the largest taffic flows are blocking cards with an 8:1 ratio, so the inidividual queue allowance is only 31.25Mbps. However, as I don't see any drops I can't see a need to up the default values and the larger traffic flows are actually incoming, not outgoing on these blocking ports. As I understand it the Rx side of the cards does not have 8:1 contention, only the Tx side.

Any pointers for something I may have missed most appreciated!

Jed

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Joseph W. Doherty Fri, 11/06/2009 - 05:21

"The traffic in question is traversing GRE tunnels at different points in the infrastructure so could this be relevant? "

This might if there's ongoing fragmentation.

jedellerby Fri, 11/13/2009 - 08:39

Thanks for the pointer.

I took a look at the CPU related drops to see if fragmentation was sending packets up to the CPU, and there is an element of this happening. It also seems pure GRE packets are process switched hence these lead to a bigger CPU impact. The CPU hits 80-90% quite consistently on the key switches.

However, the CPU counter stats don't show any significant drops against L3 forwarding which is where I think IP fragments and GRE packets are processed. So, although CPU is high I can't see any dropped packets.

I think the basic issue here is that some packets get dropped during peak periods. Following on from this, the application can take 10's of minutes to recover. The application is a WEB cache service (Squid) with local caches on the network talking to central peer cache servers. The network drops packets at times between the central servers and clients, which then results in a melt down for 10+ minutes of WEB browsing.

I don't think the issue is QoS related, it's the fact the network is overloaded and the Squid proxy perring setup doesn't react well to dropped packets. Perhaps there are lots of open TCP connections due to retries and this is leading to an exhaustion of resources on the WEB cache.

Only issue with this theory is the customer says he can't see an issue on the WEB cache servers :(

Jed

Actions

This Discussion