Overrun Errors on ASA 5550

Unanswered Question
Feb 23rd, 2012

I have been getting overrun errors on 3 different ASA 5550 HA pairs with traffic rates less than 100Mbps total.  I was told by one TAC guy to split the traffic between the two slots so that traffic comes in one and exits the other to maximize throughput because the 5550 was designed to work that way.  Another TAC guy told me to enable ethernet flow control to alleviate the overrun errors because the traffic was bursty, but this doesn't seem to address the root cause of the problem to either.  TCP traffic is bursty by nature and has it own flow control mechanism.  I can't seem to find any detailed info on why traffic needs to be split for 100Mbps when the marketting throughput number is 1.2G.  Is this a design flaw or limitation?  Is there a way to alleviate overrun errors?

I have this problem too.
3 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 4.6 (8 ratings)
Marcin Latosiewicz Fri, 02/24/2012 - 05:18

Good place to start:

http://www.cisco.com/en/US/docs/security/asa/quick_start/5500/5500_quick_start.html#wp35995

Shaping/flow contro/other machanisms can normally alleviate some of the overflow issues.

For the rest you need to check what is causing the overflows, netflow/wireshark/syslog analysis is a good place to start.

Additional note, If you don't have it configured, unicast RPF on all interfaces :-)

HTH,

M.

pntran1 Fri, 02/24/2012 - 09:30

Thanks so much for the info, Marcin.  The quick start doc basically said the same thing as the TAC person: "For maximum throughput, configure the ASA so that traffic is distributed equally between the two buses. Lay out the network so that traffic enters through one bus and exits through the other. "  But do you know the reason why that is necessary?  Can you please elaborate on how syslog can reveal what is causing the overflow?  How does enabling RPF help with alleviating overflow?

-Peter

Patrick0711 Fri, 02/24/2012 - 09:32

Are you seeing 'no buffer' counters on the interfaces as well?  What is the low count for the 1550 byte memory block in the output of 'show blocks'?

The TAC engineer suggested that you split traffic between the different interface modules because they both have their own internal backplane interfaces (Internal-Data0/0 and Internal-Data1/0).  However, this will only help if you're actually overrunning the internal interfaces.

Flow control and overrunning an interface are unrelated.  Flow control is determined by a client and server's receive buffers/windows, not by intermediary devices.  Additionally, flow control only dictates how many much data can be sent without an ACK but doesn't specify the rate at which the data is sent. 

My concern is that there are numerous cases where overruns occur and there are plenty of 1550 byte memory blocks available and the interface doesn't show 'no buffer' counters.  I can't imagine why an interface would be overrun in this scenario but have never been able to find a conclusive answer from Cisco. 

pntran1 Fri, 02/24/2012 - 09:45

Thanks for your reply, Patrick.  The no buffer counter is 0. 1550 block looks normal.

asa00k/pri/act# show interface gig 1/1

Interface GigabitEthernet1/1 "inside", is up, line protocol is up

  Hardware is VCS7380 rev01, BW 1000 Mbps, DLY 10 usec

        Auto-Duplex(Full-duplex), Auto-Speed(1000 Mbps)

        Media-type configured as RJ45 connector

        MAC address 0025.4538.83cd, MTU 1500

        IP address 10.174.1.253, subnet mask 255.255.255.0

        131235122531 packets input, 66053054633431 bytes, 0 no buffer

        Received 147575117 broadcasts, 0 runts, 0 giants

        34201765 input errors, 0 CRC, 0 frame, 34201765 overrun, 0 ignored, 0 abort

-----------------------------------------------------------------------^^^^^^^^^^^---------------------------------

        0 L2 decode drops

        214442064068 packets output, 47534402931048 bytes, 0 underruns

        0 output errors, 0 collisions, 0 interface resets

        0 late collisions, 0 deferred

        0 input reset drops, 0 output reset drops

        0 rate limit drops

        input queue (blocks free curr/low): hardware (0/0)

        output queue (blocks free curr/low): hardware (0/0)

  Traffic Statistics for "inside":

        131189361555 packets input, 63641462929613 bytes

        214411432424 packets output, 43673527445793 bytes

        331334100 packets dropped

      1 minute input rate 7085 pkts/sec,  3171622 bytes/sec

      1 minute output rate 12327 pkts/sec,  1481679 bytes/sec

      1 minute drop rate, 1 pkts/sec

      5 minute input rate 7797 pkts/sec,  3358887 bytes/sec

      5 minute output rate 13731 pkts/sec,  1683791 bytes/sec

      5 minute drop rate, 1 pkts/sec

asa00k/pri/act# show blocks

  SIZE    MAX    LOW    CNT

     0   1450   1386   1447

     4    100     99     99

    80    400    336    400

   256   1612   1497   1612

  1550   7296   6013   7037

  2048   2100   1577   2100

  2560    164    164    164

  4096    100    100    100

  8192    100    100    100

16384    110    110    110

65536     16     16     16

asa00k/pri/act# show blocks

  SIZE    MAX    LOW    CNT

     0   1450   1386   1447

     4    100     99     99

    80    400    336    400

   256   1612   1497   1612

  1550   7296   6013   7037

  2048   2100   1577   2100

  2560    164    164    164

  4096    100    100    100

  8192    100    100    100

16384    110    110    110

65536     16     16     16

pntran1 Sun, 02/26/2012 - 19:28

Thanks for your reply.  Can you please elaborate on how hardcoding the speed and duplex will help in this case?  I see no CRC errors and no collisions. 

pntran1 Tue, 02/28/2012 - 11:24

Thanks so much for the link - lots of useful info.  Unfortunately, I did checked speed and duplex and they matched on both sides and also saw no CRC and no collisions so I ruled it out.  CPU utilization is also very low so enable RPF probably won't help either.  No buffer count is 0 so I am not sure why we are getting overrun errors at very low traffic rates.  We saw overrun errors at as low as 5Mbps with 1 min sampling interval.  That doesn't tell whether the traffic is highly bursty instantaneously at some point, but how bursty can it be with 5Mbps worth of traffic? And how much burstiness can the ASA tolerate?  I have been trying to get a logical explaination for the overrun errors from Cisco TAC for about 2 weeks, but they just danced around the question.  I believe there is an inherent design limitation or software bug that causes this.  I am afraid that without getting to the root cause, turning flow control just maquerates the problem.

Borman Bravo Wed, 03/07/2012 - 08:24

Hi, were you able to resolve this issue, I'm having the same problem. Thanks

pntran1 Wed, 03/07/2012 - 08:51

Hi, I am getting a lot closer, but not quite there yet.  It appears that the ASA5550 cannot tolerate highly bursty traffic even for a very short period of time (saw errors for less than 10Mb worth of bursty traffic).  Turning on flow control will clear up the input errors, but I am still trying to gather the data to understand the full impact of flow control on the performance.  I am very surprised by the fact the input buffers can not handle such a low level of bursty traffic. 

Just got off the phone with the TAC engineer, he was very helpful in providing me with the performance test data.  It looks like turning on flow control is the only option.

Borman Bravo Tue, 03/13/2012 - 12:03

Thanks for sharing this information, I tried looking up the commands to

turn on flow control on the ASA but can't find instructions for 8.3, would you mind sharing this as well? I appreciate it.

pntran1 Tue, 03/13/2012 - 14:20

You need to use 8.2.5 and later or 8.4.3 and later.  The "feature" is not supported in 8.3.  The commands are:

On the ASA:

GigabitEthernet0/2

flowcontrol send on

On the Ethernet Switch (if you are using Cisco):

int gigabitEthernet 0/2

flowcontrol receive on

Make sure you do it during maintenace hours or when nobody is looking as this will reset all of your connections .

Patrick0711 Wed, 03/14/2012 - 08:09

Maybe someone from Cisco can actually chime in to better explain how 'bursty' traffic can overrun an interface even though there are sufficient 1550byte memory blocks, 0 'no buffer counters' low throughput, and low CPU utilization. 

I've seen this same issue many times in the past and have always received the same response about bursty traffic and flow control.  Obviously, there's a limiting factor somewhere in the ASA architecture.  Why should I need to use flow control and have the switch buffer the data if, supposedly, there is plenty of buffer space and memory blocks on the ASA?

Borman Bravo Tue, 03/27/2012 - 06:36

Cisco TACS is now recommending finding those hosts that are generating the bursty traffic, nothing to be done on the ASA to deal with this issue, I'd also like to share the Cisco engineer answer on the ASA handling of traffic and buffering:

"I would like to confirm that the input queue on an interface has a capacity that varies. Depending upon the configuration and speed and duplex settings on both sides it can be 13 packets or a maximum of 75 packets. At the same time the output queue could be as little as 2 packets and as much as 40 packets"

If anyone has any other recommendations I would appreciate it/

pntran1 Tue, 03/27/2012 - 09:17

Thanks for sharing the info.  Did they say what to do with those bursty hosts once you find them?  I don't have the full context of the reason and logic behind that suggestion, but it doesn't seem to make any sense.  I have told TAC repeatedly that traffic on the network will always be bursty.  The only time that you may not have bursty traffic is when you are streaming video at constant bit rate.  TCP traffic is bursty by nature.  Even if you only have UDP traffic on the network, the accumulative effect of multiple hosts transmitting at the same time still makes the traffic bursty. 

The queue size is not as important as how fast the ASA can pull the packets out of the queue and forward them.  It seems like they can not do that very fast so the queue gets filled up very quickly and overflowed. 

Have you had a chance to try turning on flow control?  Did it help?

My experience dealing with TAC is that different TAC engineers may give you complete different answers and sometimes contradicting each other.  If the answer doesn't make sense, you can ask for an escalation.

Borman Bravo Tue, 03/27/2012 - 13:13

Hi, I escalated this ticket, waiting for support. I can't turn on flow control, is not supported on the 8.3 version I'm running, will let you know if I find a resolution from Cisco TACS, thanks.

Farhad.Valiyev Sat, 07/07/2012 - 20:10

Hi guys,

Did you figure this out with the TAC?

I seem to have the same situation with my 5520s and I am wondering if enabling flow control is the only solution to the problem.

Thanks!

Patrick0711 Fri, 07/27/2012 - 13:00

Bringing this back from the dead...

Interface GigabitEthernet1/0 "inside", is up, line protocol is up

  Hardware is VCS7380 rev01, BW 1000 Mbps, DLY 10 usec

        Full-Duplex(Full-duplex), 1000 Mbps(1000 Mbps)

        Input flow control is unsupported, output flow control is unsupported

        Media-type configured as RJ45 connector

        MAC address 1cdf.0f66.35da, MTU 1500

        IP address x.x.x.x, subnet mask 255.255.255.0

        2136842533 packets input, 2505600752697 bytes, 0 no buffer

        Received 6118 broadcasts, 0 runts, 0 giants

        3519301 input errors, 0 CRC, 0 frame, 3519301 overrun, 0 ignored, 0 abort

        0 L2 decode drops

        1231746618 packets output, 186031413394 bytes, 0 underruns

        0 pause output, 0 resume output

        0 output errors, 0 collisions, 0 interface resets

        0 late collisions, 0 deferred

        0 input reset drops, 0 output reset drops

        0 rate limit drops

        input queue (blocks free curr/low): hardware (0/0)

        output queue (blocks free curr/low): hardware (0/0)

  Traffic Statistics for "inside":

        2132528034 packets input, 2462177894354 bytes

        1231761102 packets output, 163876265191 bytes

        69112 packets dropped

      1 minute input rate 15279 pkts/sec,  17668962 bytes/sec

      1 minute output rate 8978 pkts/sec,  1071206 bytes/sec

      1 minute drop rate, 0 pkts/sec

      5 minute input rate 17170 pkts/sec,  20191846 bytes/sec

      5 minute output rate 9932 pkts/sec,  1170989 bytes/sec

      5 minute drop rate, 0 pkts/sec

This platform has an ASA 5510 Security Plus license.

sh blocks

  SIZE    MAX    LOW    CNT

     0    400    383    400

     4    100     99     99

    80    400    284    399

   256   3148   2928   3148

  1550   2285   1807   2025

  2048   2100   1422   2100

  2560    164    164    164

  4096    100     98    100

  8192    100     99    100

16384    100     99    100

65536     16     16     16

Interface is being overrun, there are sufficient 1550 byte memory blocks, the 'no buffer' count is 0.  Still, after several years, trying to find an explanation that explains this type of overrun.  If it were 'bursty' traffic, wouldn't the interface buffer fill before being overrun which would show an incrementing 'no buffer' counter?

mronayne Tue, 01/08/2013 - 14:28

There is a FIFO buffer present on the NIC hardware where packets are first received, before getting DMA'd to a receive buffer in the ASA RAM that is assigned to this interface. An overrun is counted when a packet cannot be received by the NIC because the FIFO is full.

There several possible reasons why the FIFO may fill up, but the two most obvious ones are

1. A traffic burst fills up the FIFO very quickly. This is entirely possible even with a relatively low aggregate throughput on the interface measured over minutes.

2. The NIC cannot find a free receive buffer assigned to this interface to DMA the packet to. There are many possible causes of this but most relate to the ASA simply not being able to keep up with the amount of traffic it is receiving.

Note that neither of these cases will increment the "no buffer" counter in recent ASA code. AFAIK that counter is incremented when the driver attempts and fails to assign a new receive buffer to the interface.

Since your interface does not seem to be oversubscribed I believe that you are most likely hitting case 1.

Implementing flow control on the interface should help with this.

ppokorny25 Wed, 11/21/2012 - 07:50

Hi,

Before I set up new thread, I would like to ask here.

I have installed 5550, and my input/output queues looks like this:

GigabitEthernet0/0

input queue (blocks free curr/low): hardware (255/255)

output queue (blocks free curr/low): hardware (255/254)

GigabitEthernet0/1

input queue (blocks free curr/low): hardware (255/230)

output queue (blocks free curr/low): hardware (255/0)

GigabitEthernet0/2

input queue (blocks free curr/low): hardware (255/255)

output queue (blocks free curr/low): hardware (255/255)

GigabitEthernet0/3

input queue (blocks free curr/low): hardware (255/230)

output queue (blocks free curr/low): hardware (254/97)

GigabitEthernet1/0

input queue (blocks free curr/low): hardware (0/0)

output queue (blocks free curr/low): hardware (0/0)

GigabitEthernet1/1

input queue (blocks free curr/low): hardware (0/0)

output queue (blocks free curr/low): hardware (0/0)

GigabitEthernet1/2

input queue (blocks free curr/low): hardware (0/0)

output queue (blocks free curr/low): hardware (0/0)

GigabitEthernet1/3

input queue (blocks free curr/low): hardware (0/0)

output queue (blocks free curr/low): hardware (0/0)

Do you have any idea why 4GE module in 5550 has no buffers? Software is 8.4.3.

I also experiece above mentioned problems with dropping traffic, and my searching brought me to this.

Thank you

shayatin1 Mon, 03/04/2013 - 17:24

I too am bringing this back from the dead.

I have this same problem, and I acted upon TAC's request to turn on flow control. I do find myself with another related issue now though.

While I no longer see any input errors on the ASA interfaces, I am seeing about 10,000 pause frames a minute being sent from the ASA to my upstream 6500. While this seems like a TON of pauses to me, TAC seems not too concerned about it.

TAC is suggesting that the problem lies in too many small / bursty packets.

I'm running about 300 mbps / 60,000pps / 700 connections on a 5550 in transparent mode.

I will update as I have more info.

pntran1 Tue, 03/05/2013 - 10:25

Thanks so much for the URL to very useful info, Steve.  Did you experience any performance issue after enabling flow control?

shayatin1 Tue, 03/05/2013 - 11:11

Well I only implemented the flow control on Sunday night so it’s only been about 36 hours now.

I think I'm well over 4 million pause frames sent back to my 6500, and actually I've seen no performance issues.

It just seems like a TON of pauses , but TAC says that’s nothing for a 6500 to handle, and the overruns have not incremented at all on the ASA so I guess there has been no packet loss.

Between that link and my TAC engineer I've got a really good idea as to the where the bottleneck lies. Basically something on my network is sending traffic that is filling the FIFO queue on the NIC before the next packet can be put on the receive ring. Most likely this is caused by lots of very small packets (maybe) coming in very bursty. I say maybe because if my receive ring is full to the point that 300 times a second the ASA has to ask for pause then it seems that it’s probably pretty constant little packets, or basically continual burst.

The questions will now be to track down where these little guys are coming from.

So in searching the internet about this issue it seems really common, and the bottleneck seems to be the FIFO queue on the NICs on these units. The best info I've found says the queue is 32 - 48KB depending on your model. I really wonder if these new 55x5-X models are using a larger buffer for the queue on the NICs. Unfortunately we only recently purchased these 5550's so upgrading won’t make the bosses happy, better to address the root cause.

Hope that helps.

ricky.eng Fri, 06/28/2013 - 20:58

Apparentyly, this is a product architectural issue due to single core CPU for all ASA5500 series. Overrun is due to CPU hog. I encounter same issue too and confirmed CPU hog detected. Same thing TAC suggest to make use of both PCI, agree it will help mitigate overrun but if CPU continue hog due to number of software feature enabled in ASA, my fell is overrun will continue. Flow control will help but it will shift the botttlenck to upstream and slowing down the traffic. This box is distance away from the 1.2Gbps spec.

http://www.cisco.com/en/US/products/ps6120/products_tech_note09186a0080bfa10a.shtml

Mikulasik Tue, 11/05/2013 - 12:06

I have the same issue on 5550s, 5510s and 5520s that we have in production. Low CPU, moderate traffic, etc. I think it comes down to the NICs as they are old Intel NICs with 16kb of buffer. I found that I recieve no overruns on the 5550's addon card which makes me think it is these junky Intel NICs that Cisco uses or the PCI bus. We do have a 5515-X in production with no overruns, so maybe it is fixed on newer boxes since they use PCI-E rather than PCI for the NICs.

shayatin1 Tue, 11/05/2013 - 12:22

Yes this is the NIC buffer on the 55x0 models.

I’ve gone round and round with TAC, and gotten two of their CCIEs on the case (both of whom were super knowledgeable btw), and they have proven to me that this is simply a hardware shortcoming with these models.

The 55x5-x models use a different NIC with a larger buffer, and should not have this issue.

Bummer for those of us with a bursty traffic profile.

We are moving to these new units Q1 2014.

Mikulasik Tue, 11/05/2013 - 14:47

It is pretty sad though, the traffic isn't really that burst for some of our sites and they still fail to hold up. I would recommend replacing any older ASA you have if you plan to do a internet upgrade above 50Mbit. The ASAs were made for a different traffic profile based off the internet that existed in 2005.

Actions

Login or Register to take actions

This Discussion

Posted February 23, 2012 at 9:23 PM
Stats:
Replies:29 Avg. Rating:4.58333
Views:4779 Votes:3
Shares:0
Tags: No tags.

Discussions Leaderboard

Rank Username Points
1 7,861
2 6,140
3 3,170
4 1,473
5 1,446