7206VXR NPE-G1 Gigabit Interface Overruns

Unanswered Question
May 15th, 2008
User Badges:

I just upgraded my 7206 from an NPE-400 to an NPE-G1. It is a WAN router connected to a core switch (6500). The connection between them (as a point to point routed link) was 100/full, and is now 1000/full over copper. The Gigabit interface on the router is now reporting overruns, and I am seeing packet loss in an IP Video application that runs over the WAN. I didn't see that with the NPE-400.


In the TAC case colletion I found this:

http://www.ciscotaccc.com/core/showcase?case=K18225106


It states: "On some interface types, this chipset and packet buffer cannot handle a long burst of frames. Such interfaces are meant to provide connectivity to a certain network type, and not to switch packets at line rate. The line rate of these interfaces is often higher than the switching capacity of the router."


A solution would be to slow down the traffic coming from the 6500, but I would like some suggestions on how to do it. QoS on the 6500 isn't an option at this point. Enabling flowcontrol on the switch?


I just can't believe I have problems on a gigabit interface that is only passing about ~35Mbps.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
paolo bevilacqua Thu, 05/15/2008 - 09:03
User Badges:
  • Super Gold, 25000 points or more
  • Hall of Fame,

    Founding Member

Which exact IOS are you using? If not latest, you might have a fix in a subsequent one.

Joseph W. Doherty Thu, 05/15/2008 - 09:23
User Badges:
  • Super Bronze, 10000 points or more

"A solution would be to slow down the traffic coming from the 6500, but I would like some suggestions on how to do it."


If both sides of the link are triple speed copper, and since you note normal traffic is only about 35 Mbps, you might try setting port speeds to 100 Mbps.


Don't know if this will help, but I've seen 7204 and 7304 drop inbound packets during CPU processing spikes (in my instance, it was the BGP scanner). A partial solution was to increase the inbound hold queues. (Believe there is a whitepaper somewhere on the Cisco site from where I obtained this recommendation. Recall the hardware saved the inbound packet in the inbound queue but relied on the CPU being available to drain the inbound queue.)


What else you might also try is to insure all other optimum configuration options are active, e.g. CEF, compiled ACLs, flow cache.

d.poppleton Thu, 05/15/2008 - 10:27
User Badges:

I wanted to have more than 100Mbps on this, because we will be putting more traffic on this eventually (I have an ATM OC-3 on this router.)


The CPU appears to be low, and I have

cef/flowcache on and just some QoS coloring on the input interface. I've thought about increasing the buffers, but was hoping for something a little easier. Thanks!

paolo bevilacqua Thu, 05/15/2008 - 10:31
User Badges:
  • Super Gold, 25000 points or more
  • Hall of Fame,

    Founding Member

Buffers are of fundamental importance, check if you have any failure. Recommended, "buffers tune automatic". It works really well.

Joseph W. Doherty Thu, 05/15/2008 - 15:19
User Badges:
  • Super Bronze, 10000 points or more

A bit ugly, but if you had to, perhaps activate two 100 Mbps LAN side interfaces. Load balance using L2 or L3. This should allow the OC-3 to become the bottleneck but would also avoid gig Ethernet slamming the router's interfaces.

paolo bevilacqua Thu, 05/15/2008 - 15:22
User Badges:
  • Super Gold, 25000 points or more
  • Hall of Fame,

    Founding Member

Joseph, when I was at the place were the 7200 is designed, our pride and joy was to make it work and work well.

Nobody should be forced to go backward with a $$$ router board, I'm positive TAC or someone can help him get it right in gigabit mode.

Joseph W. Doherty Thu, 05/15/2008 - 16:02
User Badges:
  • Super Bronze, 10000 points or more

Still sometimes handy to have a "work around" option.

Giuseppe Larosa Fri, 05/16/2008 - 07:59
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Darren,

are you using one of the three GE ports on the NPE-G1 itself of a PA-GE in a bay slot, aren't you ?


I suppose you are using one of the 3 GE ports.


Two years and a half ago I did performance tests on NPE-G1, including IP and MPLS forwarding using instruments like Spyrent Smartbits and Agilent Router Tester that send frames at a fixed rate like the video stream.


The results were far better then 35 Mbps.


I would tune the input hold queues as Bevilaqua has suggested and I would also enable flow control on the link between the 7200 GE port and the catalyst 6500 port in the hope that c7200 will signal the 6500 to slow down sending an IEEE pause frame.


This is likely possible to be a software problem, the release you are using is the same on all your 7200 routers, whatever NPE model they use ?

It could be ok for NPE-400 and not so good for NPE-G1.


hope to help

Giuseppe


This could explain why you didn't see the problem before with the NPE-400.

Gregman380 Thu, 05/29/2008 - 05:46
User Badges:

I am having a simular problem with a 7206VXR NPE-G1 Gig interface. Lot of overruns, no ignored showing up. The CPU does not seem overloaded. It just seems like the interface buffer cannnot handle the rate. I see that flowcontrol was mentioned and I am wondering if IOS 12.3(11)T5 supports flowcontrol. The NPE-G1 is supposed to handle 1 million packets per second and I just don't see that that rate to be honest. What about either a port channel between the 7206 and the 3560 where the data originates? Also, since the NPE-G2 is double the capacity of the G1 would that solve this problem or with that interface fail as well?

d.poppleton Wed, 06/25/2008 - 10:15
User Badges:

We have been running the Gig interfaces at 100megs, and we are still getting overruns. I enabled flowcontrol on the 6500, but that didn't help. We have migrated a second 7206 from the NPE-400 to the NPE-G1, and it gets overruns as well. I am still working with TAC on the issue. I find it really hard to believe that the gigabit ports can't handle 100megs, but that is what it looks like. If the NPE-G2 has a better chipset for the Ethernet controllers, than it might be worth a look.

paolo bevilacqua Wed, 06/25/2008 - 10:25
User Badges:
  • Super Gold, 25000 points or more
  • Hall of Fame,

    Founding Member

Guys, I was with cisco at the time the G1 was introduced and I can ensure you that it would not have been released with any standing performance issue.

I don't know which IOS are you running but I think the best chances would be with 12.2 SB that is "made" for it.

Gregman380 Wed, 06/25/2008 - 11:34
User Badges:

We wound up ordering a NPE-G2 with the hopes that will help. I think the problem is related to the receive rings ability to hand off the packets to the CPU. I never get to this level of troubleshooting on a normal day. Cisco did tell us to upgrade to 12.4(19) because of a known issue with flowcontrol on earlier versions. We were not able to get the upgrade accomplished to see if it worked or not. Hopefully the G2 will solve our problem.

d.poppleton Wed, 06/25/2008 - 13:12
User Badges:

Would you happen to have a bug ID for the flow control issue?

Gregman380 Thu, 06/26/2008 - 03:36
User Badges:

Sorry I do not have that information. This is what Cisco TAC sent me: "There are several bugs with flow control in 12.3T, so please upgrade to

12.4(19) to get all of them."

d.poppleton Mon, 07/14/2008 - 13:25
User Badges:

Did you have an input service-policy on the interface? I just found this interesting bug (I have QoS coloring inbound on my interface, and I am thinking of trying to move that to the 6500). Bug is terminated, with no fixed in version.


CSCsh62765 Bug Details

Overruns seen on interfaces of NPE-G1 after applying QOS

Symptom:


Customer is seeing inout overruns on NPE-G1 native gigabit ethernet. Traffic flow is less than 1 mb in both directions.


Conditions:


Only seen on native interface when input service-policy is applied.


Workaround:

n/a



d.poppleton Mon, 07/28/2008 - 10:50
User Badges:

I just wanted to update that moving the QoS coloring down to the 6500 did appear to resolve the issue. I am down to


I appreciate all the replies and suggestions from everybody!

bcoverstone Fri, 06/12/2015 - 17:08
User Badges:

I can confirm that this is fixed by enabling flow control.  If you run "show controller gi0/1", you can see the rx_overflow and rx_int_drop values will precisely follow the input errors.  Unfortunately you can't increase the rx_ring, as it is hard coded at 128.

 

What you can do is turn on flow control.  This can be done by making sure that either the speed or duplex are set to AUTO.  When you do a "show int gi0/1", you will see this line:

 

 output flow-control is XON, input flow-control is XON

 

If you see "output flow-control is unsupported" then it is not on.  Check the speed/duplex settings again, and make sure they are AUTO.

 

Now, before this will be fixed, you will need to turn on flow control on your switch port.  It's enabled using the following command on a 2960/3560.

 

int gi0/1

flowcontrol receive desired

 

That's it.  No more errors.  You can also verify that you are getting pause frames from the router by running the following:

 

show flowcontrol int gi0/1

Port       Send FlowControl  Receive FlowControl  RxPause TxPause
           admin    oper     admin    oper
---------  -------- -------- -------- --------    ------- -------
Gi0/1      Unsupp.  Unsupp.  desired  on          8       0

 

Actions

This Discussion