7206VXR NPE-G1 Gigabit Interface Overruns

Unanswered Question
May 15th, 2008

I just upgraded my 7206 from an NPE-400 to an NPE-G1. It is a WAN router connected to a core switch (6500). The connection between them (as a point to point routed link) was 100/full, and is now 1000/full over copper. The Gigabit interface on the router is now reporting overruns, and I am seeing packet loss in an IP Video application that runs over the WAN. I didn't see that with the NPE-400.

In the TAC case colletion I found this:

http://www.ciscotaccc.com/core/showcase?case=K18225106

It states: "On some interface types, this chipset and packet buffer cannot handle a long burst of frames. Such interfaces are meant to provide connectivity to a certain network type, and not to switch packets at line rate. The line rate of these interfaces is often higher than the switching capacity of the router."

A solution would be to slow down the traffic coming from the 6500, but I would like some suggestions on how to do it. QoS on the 6500 isn't an option at this point. Enabling flowcontrol on the switch?

I just can't believe I have problems on a gigabit interface that is only passing about ~35Mbps.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Paolo Bevilacqua Thu, 05/15/2008 - 09:03

Which exact IOS are you using? If not latest, you might have a fix in a subsequent one.

Joseph W. Doherty Thu, 05/15/2008 - 09:23

"A solution would be to slow down the traffic coming from the 6500, but I would like some suggestions on how to do it."

If both sides of the link are triple speed copper, and since you note normal traffic is only about 35 Mbps, you might try setting port speeds to 100 Mbps.

Don't know if this will help, but I've seen 7204 and 7304 drop inbound packets during CPU processing spikes (in my instance, it was the BGP scanner). A partial solution was to increase the inbound hold queues. (Believe there is a whitepaper somewhere on the Cisco site from where I obtained this recommendation. Recall the hardware saved the inbound packet in the inbound queue but relied on the CPU being available to drain the inbound queue.)

What else you might also try is to insure all other optimum configuration options are active, e.g. CEF, compiled ACLs, flow cache.

d.poppleton Thu, 05/15/2008 - 10:27

I wanted to have more than 100Mbps on this, because we will be putting more traffic on this eventually (I have an ATM OC-3 on this router.)

The CPU appears to be low, and I have

cef/flowcache on and just some QoS coloring on the input interface. I've thought about increasing the buffers, but was hoping for something a little easier. Thanks!

Paolo Bevilacqua Thu, 05/15/2008 - 10:31

Buffers are of fundamental importance, check if you have any failure. Recommended, "buffers tune automatic". It works really well.

Joseph W. Doherty Thu, 05/15/2008 - 15:19

A bit ugly, but if you had to, perhaps activate two 100 Mbps LAN side interfaces. Load balance using L2 or L3. This should allow the OC-3 to become the bottleneck but would also avoid gig Ethernet slamming the router's interfaces.

Paolo Bevilacqua Thu, 05/15/2008 - 15:22

Joseph, when I was at the place were the 7200 is designed, our pride and joy was to make it work and work well.

Nobody should be forced to go backward with a $$$ router board, I'm positive TAC or someone can help him get it right in gigabit mode.

Giuseppe Larosa Fri, 05/16/2008 - 07:59

Hello Darren,

are you using one of the three GE ports on the NPE-G1 itself of a PA-GE in a bay slot, aren't you ?

I suppose you are using one of the 3 GE ports.

Two years and a half ago I did performance tests on NPE-G1, including IP and MPLS forwarding using instruments like Spyrent Smartbits and Agilent Router Tester that send frames at a fixed rate like the video stream.

The results were far better then 35 Mbps.

I would tune the input hold queues as Bevilaqua has suggested and I would also enable flow control on the link between the 7200 GE port and the catalyst 6500 port in the hope that c7200 will signal the 6500 to slow down sending an IEEE pause frame.

This is likely possible to be a software problem, the release you are using is the same on all your 7200 routers, whatever NPE model they use ?

It could be ok for NPE-400 and not so good for NPE-G1.

hope to help

Giuseppe

This could explain why you didn't see the problem before with the NPE-400.

Gregman380 Thu, 05/29/2008 - 05:46

I am having a simular problem with a 7206VXR NPE-G1 Gig interface. Lot of overruns, no ignored showing up. The CPU does not seem overloaded. It just seems like the interface buffer cannnot handle the rate. I see that flowcontrol was mentioned and I am wondering if IOS 12.3(11)T5 supports flowcontrol. The NPE-G1 is supposed to handle 1 million packets per second and I just don't see that that rate to be honest. What about either a port channel between the 7206 and the 3560 where the data originates? Also, since the NPE-G2 is double the capacity of the G1 would that solve this problem or with that interface fail as well?

d.poppleton Wed, 06/25/2008 - 10:15

We have been running the Gig interfaces at 100megs, and we are still getting overruns. I enabled flowcontrol on the 6500, but that didn't help. We have migrated a second 7206 from the NPE-400 to the NPE-G1, and it gets overruns as well. I am still working with TAC on the issue. I find it really hard to believe that the gigabit ports can't handle 100megs, but that is what it looks like. If the NPE-G2 has a better chipset for the Ethernet controllers, than it might be worth a look.

Paolo Bevilacqua Wed, 06/25/2008 - 10:25

Guys, I was with cisco at the time the G1 was introduced and I can ensure you that it would not have been released with any standing performance issue.

I don't know which IOS are you running but I think the best chances would be with 12.2 SB that is "made" for it.

Gregman380 Wed, 06/25/2008 - 11:34

We wound up ordering a NPE-G2 with the hopes that will help. I think the problem is related to the receive rings ability to hand off the packets to the CPU. I never get to this level of troubleshooting on a normal day. Cisco did tell us to upgrade to 12.4(19) because of a known issue with flowcontrol on earlier versions. We were not able to get the upgrade accomplished to see if it worked or not. Hopefully the G2 will solve our problem.

Gregman380 Thu, 06/26/2008 - 03:36

Sorry I do not have that information. This is what Cisco TAC sent me: "There are several bugs with flow control in 12.3T, so please upgrade to

12.4(19) to get all of them."

d.poppleton Mon, 07/14/2008 - 13:25

Did you have an input service-policy on the interface? I just found this interesting bug (I have QoS coloring inbound on my interface, and I am thinking of trying to move that to the 6500). Bug is terminated, with no fixed in version.

CSCsh62765 Bug Details

Overruns seen on interfaces of NPE-G1 after applying QOS

Symptom:

Customer is seeing inout overruns on NPE-G1 native gigabit ethernet. Traffic flow is less than 1 mb in both directions.

Conditions:

Only seen on native interface when input service-policy is applied.

Workaround:

n/a

d.poppleton Mon, 07/28/2008 - 10:50

I just wanted to update that moving the QoS coloring down to the 6500 did appear to resolve the issue. I am down to

I appreciate all the replies and suggestions from everybody!

bcoverstone Fri, 06/12/2015 - 17:08

I can confirm that this is fixed by enabling flow control.  If you run "show controller gi0/1", you can see the rx_overflow and rx_int_drop values will precisely follow the input errors.  Unfortunately you can't increase the rx_ring, as it is hard coded at 128.

 

What you can do is turn on flow control.  This can be done by making sure that either the speed or duplex are set to AUTO.  When you do a "show int gi0/1", you will see this line:

 

 output flow-control is XON, input flow-control is XON

 

If you see "output flow-control is unsupported" then it is not on.  Check the speed/duplex settings again, and make sure they are AUTO.

 

Now, before this will be fixed, you will need to turn on flow control on your switch port.  It's enabled using the following command on a 2960/3560.

 

int gi0/1

flowcontrol receive desired

 

That's it.  No more errors.  You can also verify that you are getting pause frames from the router by running the following:

 

show flowcontrol int gi0/1

Port       Send FlowControl  Receive FlowControl  RxPause TxPause
           admin    oper     admin    oper
---------  -------- -------- -------- --------    ------- -------
Gi0/1      Unsupp.  Unsupp.  desired  on          8       0

 

Actions

This Discussion