Network performance and RTA

Unanswered Question
Apr 26th, 2007

Our network managment system is consistently reporting a reoccurrence in performance degradation on a nightly basis. Our core/dist layer is 2-6506E and access layer is 2960G's. Between the 10:20 pm and 10:30 pm CST, RTA for the same multiple devices increases dramatically and then returns immediately to normal operational/baseline levels. Any thoughts on what could be the root cause of such a consistent occurrence? Would this be symptomatic of a broadcast storm? Could Spanning Tree be the root cause? What is the most simplistic way to determine the root cause?

See examples below:

***** Nagios *****

Notification Type: PROBLEM

Service: PING

Host: Chicago WAN Level3/Wiltel (secondary)

Address: 172.17.11.12

State: CRITICAL

Date/Time: Wed Apr 25 22:16:35 CDT 2007

Additional Info:

PING CRITICAL - Packet loss = 0%, RTA = 2131.27 ms

***** Nagios *****

Notification Type: RECOVERY

Service: PING

Host: Chicago WAN Level3/Wiltel (secondary)

Address: 172.17.11.12

State: OK

Date/Time: Wed Apr 25 22:20:57 CDT 2007

Additional Info:

PING OK - Packet loss = 0%, RTA = 1.25 ms

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
sachinraja Thu, 04/26/2007 - 17:47

Hello michael,

Are these components in the same building, just like a normal core-edge architecture ? a latency is 2131.27 ms is lethal if this is the case... is this happening only at a specified time, or quite often ? We need to basically see some important things during this period:

1) CPU usages of the core switch

2) broadcast storms if any, as u suggested. you can have a look at the switch logs to see if there are any STP loops too !!!

3) try running a syslog and see if there are any attacks or unnecessary flooding of traffic during this time. if so, you need to analyze and troubleshoot accordingly.

4) troubleshoot hop by hop between the network management station & the core switches... see if there are any errors on the trunks/switchports etc...

Hope this helps.. do let us know your inputs on this.. all the best.. rate replies if found useful..

Raj

Iain Thu, 04/26/2007 - 18:43

The name of your host leads me to think that this is a WAN link. Is this the case?

I would recommend setting up a SPAN session and capturing some packets while the problem is occurring.

Who knows what it is. Perhaps some backup job is saturating the line?

For future reference you might want to consider implementing a netflow collector to monitor traffic on your core routers. Such a system can greatly simplify bandwidth management.

http://www.crannog-software.com/index.php?go=Product.ShowDetail&ProductID=1

HTH - pls rate helpful posts

Actions

This Discussion