Cisco7604 TCAM Utilization logged

rwaqanitoga · ‎07-22-2014

Hi Everyone,

I am recently noticing the error log below on our gateway BGP router and am interested to know the fix in order to clear this error log.

Jul 22 11:26:47.739: %EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [93%]
Jul 22 11:39:54.119: %EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [92%]
Jul 22 11:57:30.482: %EARL_NETFLOW-SP-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [95%]

Really appreciate if someone can assist in fixing this issue.

Best Regards,

Waqanitoga

Akash Agrawal · ‎07-22-2014

Hi,

It is normal for the TCAM to become full and then rapidly age out entries when it does. When "service internal" is enabled, which is a hidden command for Cisco troubleshooting, it will generate a log message every time this happens. Turning off service internal will stop these messages from appearing and there will be no impact to the switch.

Explanation on Netflow TCAM utilization
=======================================

Netflow is a feature used to collect statistics on the traffic traversing a switch. The statistics are then stored in the Netflow table until they are exported by Netflow Data Expert (NDE). There is Netflow table on the PFC and on each DFC. Some features, such as NAT, require the flow to be processed in software intially, ane then hardware-accelarated. The Netflow table on the PFC and DFC collects statistics for traffic that are hardware-accelarated or flow-switched.

Some features use Netflow such as NAT and QoS. NAT uses Netflow to make forwarding decisions while QoS can use Netflow to monitor flows for micropolicing. Using Netflow Data Export (NDE) we have the ability to export these statistics to an external Netflow collector for further analysis of the network behavior.
When the Netflow TCAM table reaches a certain warning threshold the switch will generate an error message:
%EARL_NETFLOW-4-TCAM_THRLD: Netflow TCAM threshold exceeded, TCAM Utilization [99%]

This message indicates that the NetFlow ternary content addressable memory (TCAM) is almost full. Aggressive aging will be temporarily enabled.
The Supervisor Engine 720 polls how full the NetFlow table is at each poll interval. The Supervisor Engine activates aggressive aging when the table size reaches a set threshold.When the table is nearly full there are new active flows that cannot be created because of the lack of available space in the TCAM. At this point, it makes sense to, more aggressively, age-out the less active or non-active flows in the table in order to create space for new flows. There is nothing that stops the flow from being reinserted into the table as long as it meets the configured timeout and packet threshold values which we will discuss later in this document.

Reducing TCAM Utilization:
--------------------------
We can take actions in software and/or in hardware to reduce the Netflow TCAM utilization below a recommended amount (enough to stop the errors).

In Software:
Netflow Masks:
Netflow uses the concept of masks. The Netflow mask allows the user to control the volume and granularity of the statistics collected. This allows the user to control the impact on the Supervisor Engine processors. The more specific the mask we use, the more Netflow table entries we will use.

For example, if we kept statistics on flows per source ip address (Source-only mask) then we would use fewer entries than if we kept flows per source and destination ip address pair (Destination-source mask).

Maks types listed in order from least specific to most specific:
- Source-only
- Destination
- Destination-source
- Destinatin-source-interface
- Full
- Full-interface
- Non interface full*

* The non-interface full mask is an internal mask used specifically for PAT. In this case (PAT) the software will need to differentiate between non-fragmented and fragmented flows in the Netflow table. Non-fragmented packets will drive the interface full flow mask. Fragmented packets will be punted to the MSFC for software processing.

If you change the NetFlow mask to FULL or FULL-interface mode, TCAM for NetFlow can overflow depending on how many intefaces we have this enabled on. Issue the 'show mls netflow ip count' command in order to check this information. Even though changing masks is an option, most customers will want to use Interface-full mode so that they have the most useful and more granular statistics that include Layer 2, 3, and Layer 4 information.

Please check if you have configured "interface-full". You can change it to Full or Destination-Source-interface

Please dont forget to rate the post if it has been helpful

Regards,

Akash

rwaqanitoga · ‎07-29-2014

Hi Akash,

Thankyou for this explanation and the fix you have outlined.

So far now(more than a week) we have reduced the log buffer from 100000bytes to 8192bytes and after reboot the error seems to have gone away. We are still monitoring this issue and in case this 'error' pops up again we shall try and attempt the necessary fix you have outlined.

Regards,

Ruveni