Cat6500 Sup 720: Slow on Traffic for itself while fast on Routing/Switching

Unanswered Question
Feb 6th, 2009

hi there

we have a problem with our Core-Routers: Cat 6506 with Sup 720, Running on s72033-advipservicesk9_wan-mz.122-18.SXF15a:

If we ping the router, ca. 5% of all Packets have long responsetimes or even timeouts. On SSH-Sessions to the router, we have some DUP ACKs.

But: The traffic, which goes through those Routers is not delayed at all. If we ping a Switch or Server behind the Router, Response times are <= 1ms. SSH-Sessions on simple Cat-3560-Switches behind the core have no DUP ACKs. Only traffic to the Cat 6506 itself has problems...

Other findings:

- CPU is constant on 5-10%.

- 'Normal' Traffic does not care wether it is switched, routed between vlans or router between p2p-routed links, it never gets hurt.

- It does not matter, which Interface we address on the router. If we ping more than one adress on the same router, every Interface times out at the same time.

- The Sup32 - Cat 6506 in our Server Access (running s3223-advipservicesk9_wan-mz.122-18.SXF15a, Switching only) do not have time outs and DUP ACKs, although they sometimes have Response Times > 10 ms. Currently, we do not include them into our troubleshooting...

- No entries in logging level debugging.

- Downgrading to 122-18.SXF12 did not change anything.

- The more Traffic the router has, the more Timeouts occur. One Router, which has no Vlans on HSRP active or Spanning Tree Root, has no Timeouts at all and only very few Response Times > 10ms.

As normal traffic seems not to be affected, this is not a catastrophe, but we don't like the smell in our noses.

Does anybody have any idea, where we should continue our diagnosis, we are sorta shot out...

thx in advance and greeting from switzerland

stefan mueller

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Edison Ortiz Fri, 02/06/2009 - 13:47

Hi Stefan,

You mentioned these routers are the Core of your network, the interfaces you are pinging - are they being used as the gateway for user-vlans - or - are these management IP addresses?

I find it odd that routing in these switches can be working fine while pinging to the interfaces performing the routing are displaying a time-out.

If the interfaces in questions are loopbacks or non-routing interfaces, you may be facing duplicate IP addressing in your network.

HTH,

__

Edison.

mullzkBern_2 Fri, 02/06/2009 - 14:02

hi edison

the timeouts occure on all interfaces, user-vlan, mgmt-vlan, loopbacks and Routed-Interfaces directly connected to the other core-routers. on the vlans, we use HSRP and both the router-specific and the HSRP-Address have timeouts.

The SSH-Sessions were connected onto the Loopback-Adresses.

And yeah, i find it odd as well - and I just don't know where to start the troubleshooting...

ps: what i forgot to mention in the previous post: we are running a quite simple core: we use ospf & static routes only, and there are no ACLs setup except for switch-management...

greetings

Stefan

Edison Ortiz Fri, 02/06/2009 - 14:23

I believe the best course of action will be opening a TAC case and providing a show-tech output for further troubleshooting.

What you are experience isn't normal and should be corrected.

HTH,

__

Edison.

james.mathieson Tue, 03/29/2011 - 13:10

Hi,

I'm just wondering if you've resolved this as yet?

I have EXACTLY the same problem and have done numerous tests.

The most interesting one was shutting down one of the cores (HSRP standby) and all of a sudden the pings resolved themselves to normal.  As soon as the core was backup and still a standby, then the pings started dropping again.

I've disabled HSRP and put the HSRP address on the physical VLAN interface and even then the pings drop.

If on the other hand I shut down a VLAN interface on the HSRP standby, then the pings go good again.

I've even changed the HSRP standby group on one vlan to something different, to rule out duplicate macs, like HSRP can give you.

This is a really confusing one.

James

james.mathieson Fri, 04/01/2011 - 03:48

Problem resolved.

Device on the network was causing a DHCP broadcast storm and this caused the CPU on 6509s to jump to 99% and caused many issues elsewhere.

Actions

This Discussion