We have recently enabled netflow on one of our Cisco 7609 routers with a Sup720 3BXL supervisor card. I am running the capture on a 10Gb transit link with around 1.5G of traffic at peak. The TCAM table is constantly showing at 100% used and I would like to know if anyone has any ideas of how to get the usage down.
4 4 CEF720 4 port 10-Gigabit Ethernet WS-X6704-10GE 5 2 Supervisor Engine 720 (Active) WS-SUP720-3BXL
Netflow Resources TCAM utilization: Module Created Failed %Used 5 262018 0 100% 6 262018 0 100%
ip flow-cache timeout inactive 30 ip flow-cache timeout active 1
what really counts is:
variety of IP flows
ip flow mask settings
you can check with
sh mls netflow flowmask
The box was originally on full-interface but I have reduced it down to destination-source and TCAM is still full.
I have also increased the flow-sampler from 1 out of 500 to 1 out of 5000 with no difference to the table.
When I run show flow-sampler it seems to indicate that no packets are being matched.
Sampler : netflow-sampler, id : 1, packets matched : 0, mode : random sampling mode sampling interval is : 5000
Sampling does not reduce the TCAM utilization, because the actual sampling is done after the flow is put in the TCAM table.
The Flow-Sampler you have turned on is only for traffic punted up to the RP.
7600 does not support flow-sampler for hardware forwarded traffic.
Ah right ok. Don't remember reading that in the docs but I must have missed it. Is there anything else that we can do then to try and reduce the load on the table?
Well, that depends on what you are using the information for ...
Can you explain a little more about what you are doing? Do you have ip flow ingress turned on all your l3 interfaces, or just a few?
Did you turn on mls sampling because you were seeing high CPU?
Can you get the output of
show mls netflow table-contention detail
show mls netflow aging
Also, Sampled netflow on sup720 is done by software. To make the algorithm work, aggressive aging (along with other agings) is disabled when sampled netflow is enabled.
So, I would suggest taking off sampling all together and monitor, capturing the above data after making that change.
Thanks for the advice.
I am running netflow on just one L3 interface. The sampling was turned on as a precaution more than anything else. I am away from the office now for a week but will try your suggestions as soon as I'm back and post my results.
Give it a try, but you may just have SOOO many flows coming in (more than 250k at any time), that it may be impossible to reduce this ...
Hard limitation for sup720 is 250k netflow entries. Hopefully though, fast aging with sampling turned off will fix it for you.
I think its possible that there is more that 256k flows. At peak I think theres 500k pps. Its just a shame the sampling doesnt work b/c im prepared to accept I cant capture all 500,000. That said if Cisco are going to put a 10Gb card in a router id at least hope to be able to capture the information on one of the ports if not all 4.
Do the 12000 series boxes or above have this same limitation?
using the less specific flow mask and tuning aging timers you can improve but you can not be able to solve.
the idea is to be aggressive in order to leave space for other flows.
C12000 behaviour depends on the specific linecard in use for its distributed archtecture, but netflow if supported should have a bigger table available.
C6500/C7600 suffers of this limitation: they are able to route/switch high traffic volumes but it is difficult to avoid misses in using netflow over them.
even RSP720 does not offer a bigger netflow table.
Hope to help
I think Robert Taylor could be right on the agressive aging. There is a post on lovemytool.com about NetFlow Overflow with TCAM Tables. Perhaps it can help you. It discusses a problem with the Catalyst 6513 and using a "enable MLS fast aging" tactic.
Hope this helps.
I've re-enabled netflow after my week off with everyone's suggestions and I'm still maxing out the table. I think I just have to many flows for it to handle.
The output requested is below:
pcl-gw02#sh mls netflow aging enable timeout packet threshold ------ ------- ---------------- normal aging true 32 N/A fast aging true 15 2 long aging true 64 N/A
I have set the ageing times fairly low. Assuming I cant take a random 1 out of x selection of the incoming packets is there any harm in the box being at 100% usage all the time other than lack of netflow details?
I believe that problem is with fast aging settings. Packet threshold is too low. Try to go with default threshold value and set timeout below 10 secs.
As I understand it, fast aging is for small flows with small packet count in each flow. It is defined as flow should switch at least <threshold> packets in <timeout> period. If flow has smaller packet count during timeout secs, flow is exported out and deleted from cache.
Your setting states a flow which should have at least 2 packets in 15 secs. I believe that any flow will have 2 packets in 15 secs so that flow stays in cache and is not timed out by fast aging.
I had the same problem with RSP720-3CXL, mls aging fast time 8 did a trick for me in similar traffic conditions