Netflow TCAM table full Cisco 7609 3BXL

Unanswered Question
Aug 20th, 2010
User Badges:

Hello


We have recently enabled netflow on one of our Cisco 7609 routers with a Sup720 3BXL supervisor card.  I am running the capture on a 10Gb transit link with around 1.5G of traffic at peak.  The TCAM table is constantly showing at 100% used and I would like to know if anyone has any ideas of how to get the usage down.




4    4  CEF720 4 port 10-Gigabit Ethernet      WS-X6704-10GE 5    2  Supervisor Engine 720 (Active)         WS-SUP720-3BXL


Netflow Resources           TCAM utilization:       Module       Created      Failed       %Used                                   5             262018           0        100%                                   6             262018           0        100%



ip flow-cache timeout inactive 30 ip flow-cache timeout active 1

mls aging fast time 16 threshold 16 mls aging long 64 mls aging normal 32 mls flow ip destination-source no mls flow ipv6 mls nde sender version 5 mls sampling packet-based 64 8192

flow-sampler-map netflow-sampler     mode random one-out-of 5000

TE Interface:
ip flow ingress
mls netflow sampling flow-sampler netflow-sampler
Thanks in advance
Paul
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Fri, 08/20/2010 - 04:45
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Paul,

what really counts is:

variety of IP flows


ip flow mask settings


you can check with



sh mls netflow flowmask

Hope to help
Giuseppe
paulhughes5 Fri, 08/20/2010 - 04:51
User Badges:

The box was originally on full-interface but I have reduced it down to destination-source and TCAM is still full.

I have also increased the flow-sampler from 1 out of 500 to 1 out of 5000 with no difference to the table.

When I run show flow-sampler it seems to indicate that no packets are being matched.



Sampler : netflow-sampler, id : 1, packets matched : 0, mode : random sampling mode   sampling interval is : 5000

Robert Taylor Fri, 08/20/2010 - 06:22
User Badges:
  • Cisco Employee,

Sampling does not reduce the TCAM utilization, because the actual sampling is done after the flow is put in the TCAM table.


The Flow-Sampler you have turned on is only for traffic punted up to the RP.


7600 does not support flow-sampler for hardware forwarded traffic.

paulhughes5 Fri, 08/20/2010 - 06:37
User Badges:

Ah right ok.  Don't remember reading that in the docs but I must have missed it.  Is there anything else that we can do then to try and reduce the load on the table?

Robert Taylor Fri, 08/20/2010 - 06:41
User Badges:
  • Cisco Employee,

Well, that depends on what you are using the information for ...


Can you explain a little more about what you are doing?  Do you have ip flow ingress turned on all your l3 interfaces, or just a few?

Did you turn on mls sampling because you were seeing high CPU?


Can you get the output of

show mls netflow table-contention detail

and

show mls netflow aging


Also, Sampled netflow on sup720 is done by software. To make the algorithm work, aggressive aging (along with other agings) is disabled when sampled netflow is enabled.


So, I would suggest taking off sampling all together and monitor, capturing the above data after making that change.


Rob

paulhughes5 Fri, 08/20/2010 - 07:04
User Badges:

Thanks for the advice.

I am running netflow on just one L3 interface.  The sampling was turned on as a precaution more than anything else.  I am away from the office now for a week but will try your suggestions as soon as I'm back and post my results.

Robert Taylor Fri, 08/20/2010 - 07:07
User Badges:
  • Cisco Employee,

Ok ..


Give it a try, but you may just have SOOO many flows coming in (more than 250k at any time), that it may be impossible to reduce this ...


Hard limitation for sup720 is 250k netflow entries.  Hopefully though, fast aging with sampling turned off will fix it for you.

paulhughes5 Sat, 08/21/2010 - 04:50
User Badges:

I think its possible that there is more that 256k flows.  At peak I think theres 500k pps.  Its just a shame the sampling doesnt work b/c im prepared to accept I cant capture all 500,000.  That said if Cisco are going to put a 10Gb card in a router id at least hope to be able to capture the information on one of the ports if not all 4.

Do the 12000 series boxes or above have this same limitation?

Giuseppe Larosa Mon, 08/23/2010 - 05:22
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Paul,

using the less specific flow mask and tuning aging timers you can improve but you can not be able to solve.


see


http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2SX/configuration/guide/netflow.html#wp1147986


the idea is to be aggressive in order to leave space for other flows.


C12000 behaviour depends on the specific linecard in use for its distributed archtecture, but netflow if supported should have a bigger table available.


C6500/C7600 suffers of this limitation: they are able to route/switch high traffic volumes but it is difficult to avoid misses in using netflow over them.


even RSP720 does not offer a bigger netflow table.



Hope to help

Giuseppe

jakewilson Sat, 08/21/2010 - 19:28
User Badges:

I think Robert Taylor could be right on the agressive aging.  There is a post on lovemytool.com about NetFlow Overflow with TCAM Tables.  Perhaps it can help you. It discusses a problem with the Catalyst 6513 and using a "enable MLS fast aging" tactic.


Hope this helps.

paulhughes5 Tue, 08/31/2010 - 03:31
User Badges:

Hello.


I've re-enabled netflow after my week off with everyone's suggestions and I'm still maxing out the table.  I think I just have to many flows for it to handle.


The output requested is below:



pcl-gw02#sh mls netflow aging              enable timeout  packet threshold              ------ -------  ---------------- normal aging true       32         N/A fast aging   true       15         2 long aging   true       64         N/A

pcl-gw02#sh mls netflow table-contention detailed Earl in Module 5 Detailed Netflow CAM (TCAM and ICAM) Utilization ================================================ TCAM Utilization             :   100% ICAM Utilization             :   7% Netflow TCAM count           :   262025 Netflow ICAM count           :   9 Netflow Creation Failures    :   2510567 Netflow CAM aliases          :   0


I have set the ageing times fairly low.  Assuming I cant take a random 1 out of x selection of the incoming packets is there any harm in the box being at 100% usage all the time other than lack of netflow details?


Thanks

adorins Thu, 01/05/2017 - 05:53
User Badges:

I believe that problem is with fast aging settings. Packet threshold is too low. Try to go with default threshold value and set timeout below 10 secs.

As I understand it, fast aging is for small flows with small packet count in each flow. It is defined as flow should switch at least <threshold> packets in <timeout> period. If flow has smaller packet count during timeout secs, flow is exported out and deleted from cache.

Your setting states a flow which should have at least 2 packets in 15 secs. I believe that any flow will have 2 packets in 15 secs so that flow stays in cache and is not timed out by fast aging.

I had the same problem with RSP720-3CXL, mls aging fast time 8 did a trick for me in similar traffic conditions


Actions

This Discussion

Related Content