Netflow on SUP720B

Unanswered Question
Apr 17th, 2008

Hi,

we've got the problem wiht our 6500 SUP720B - the Netflow table (the hardware Netflow table on PFC) is too small, it's getting full in 1-2 seconds, for 720XL it's getting full in 3-5 seconds, but the smallest export time is 8 second (normal age timer), which means that we're loosing too much information and can't use Netflow data for the analisys.

I see the software-switched packets (which are going via MSFC) in Netflow data without problem because MFSC use it's own software Netflow table, but normally via MFSC is going only small part of traffic ~0.01% or something, the GRE traffic, management and so on.

I found that there was command "flow-sampler <>" to fill the Netflow table as "random sampling" (like in GSR12000), but release notes states for 12.2SXF and SXH that feature is not supported.

I'm asking myself what do I wrong?

and

how that could be that for the device, which recomened to use in Service Provider envainroment, the Netflow feature can't be use.

Can anyone help?

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Jan Nejman Thu, 04/17/2008 - 00:52

Hello,

it is very strange. What is your active timeout configuration?

Could you send me output of the commands:

show mls statistics

show mls nde

show mls netflow table-contention summary

show module

Kind regards

Jan Nejman

Caligare, Co.

http://www.caligare.com/

Konstantin Dunaev Thu, 04/17/2008 - 00:59

what do you mean "strange"?

timers are not used in our config because we're using "sampled" export Netflow:

edge1#sh run | i mls|interface

no mls acl tcam share-global

mls netflow interface

mls flow ip interface-destination-source

mls nde sender version 5

mls sampling packet-based 1024 8192

...

interface TenGigabitEthernet7/3

mls netflow sampling

interface TenGigabitEthernet7/4

mls netflow sampling

...

edge1#show mls statistics

Statistics for Earl in Module 5

L2 Forwarding Engine

Total packets Switched : 386844003296

L3 Forwarding Engine

Total packets L3 Switched : 383760512826 @ 323954 pps

Total Packets Bridged : 2148150644

Total Packets FIB Switched : 383760512824

Total Packets ACL Routed : 0

Total Packets Netflow Switched : 2

Total Mcast Packets Switched/Routed : 505840

Total ip packets with TOS changed : 712153900

Total ip packets with COS changed : 727811

Total non ip packets COS changed : 0

Total packets dropped by ACL : 581367

Total packets dropped by Policing : 1131950

Total packets exceeding CIR : 1131950

Total packets exceeding PIR : 1131950

Errors

MAC/IP length inconsistencies : 0

Short IP packets received : 0

IP header checksum errors : 0

TTL failures : 7463268

MTU failures : 0

Total packets L3 Switched by all Modules: 386843309298 @ 326557 pps

edge1#show mls nde

Netflow Data Export enabled

Exporting flows to 172.16.253.51 (4711)

Exporting flows from 172.18.254.129 (49249)

Version: 5

Layer2 flow creation is disabled

Layer2 flow export is disabled

Include Filter not configured

Exclude Filter not configured

Total Netflow Data Export Packets are:

540313 packets, 0 no packets, 15668993 records

Total Netflow Data Export Send Errors:

IPWRITE_NO_FIB = 0

IPWRITE_ADJ_FAILED = 0

IPWRITE_PROCESS = 0

IPWRITE_ENQUEUE_FAILED = 0

IPWRITE_IPC_FAILED = 0

IPWRITE_OUTPUT_FAILED = 0

IPWRITE_MTU_FAILED = 0

IPWRITE_ENCAPFIX_FAILED = 0

Netflow Aggregation Disabled

edge1#show module

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ -----------

5 2 Supervisor Engine 720 (Active) WS-SUP720-3B

7 4 CEF720 4 port 10-Gigabit Ethernet WS-X6704-10GE

Mod MAC addresses Hw Fw Sw Status

--- ---------------------------------- ------ ------------ ------------ -------

5 0016.c847.9874 to 0016.c847.9877 5.2 8.4(2) 12.2(33)SXH Ok

7 0016.47e4.5d14 to 0016.47e4.5d17 2.3 12.2(14r)S5 12.2(33)SXH Ok

Mod Sub-Module Model Serial Hw Status

---- --------------------------- ------------------ ----------- ------- -------

5 Policy Feature Card 3 WS-F6K-PFC3B 2.3 Ok

5 MSFC3 Daughterboard WS-SUP720 2.5 Ok

7 Centralized Forwarding Card WS-F6700-CFC 2.1 Ok

Mod Online Diag Status

---- -------------------

5 Pass

7 Pass

edge1#show mls netflow table-contention summary

Earl in Module 5

Summary of Netflow CAM Utilization (as a percentage)

====================================================

TCAM Utilization : 32%

ICAM Utilization : 0%

Netflow Creation Failures : 0

Netflow CAM aliases : 0

now the netflow table is ~40% full because we've switched to souce-dest-int flow mask and currently we have only ~40% of max ttaffic.

edited: it's starting from 0% then growing upto 40% during 8 seconds (export interval in mls netflow sampled) and then the netflow table became emty again.

Jan Nejman Thu, 04/17/2008 - 01:12

Hello,

under "strange" I mean that flow cache is full in 3 sec. I've never seen so quickly full cache. We are also using 4x10GE cards, cache is also full but not so quickly...

Could you change flow mask and test what happents with TCAM utilization? (show mls netflow table-contention summary). I'm not sure if canging flow mask will more utilize TCAM. I think that a Cisco internally store all fields in TCAM.

Could you decrease active flow timeout to 1-2 minute? I recommend set the inactive timeout between 8-16 sec. Do you using "mls netflow usage notify" to logging heavy loaded cache?

Kind regards,

Jan

Konstantin Dunaev Thu, 04/17/2008 - 01:19

thanks for the particpating :)

I don't think it's strange, because we have in peaks ~500K concurrent sessions (I can see it from our loadbalancer staistic) it means that Netflow should keep at least 500K entries concurrently.

if I change the flow mask to full I'm getting the table pretty full:

edge1#show mls netflow table-contention detailed

Earl in Module 5

Detailed Netflow CAM (TCAM and ICAM) Utilization

================================================

TCAM Utilization : 61%

ICAM Utilization : 0%

Netflow TCAM count : 80266

Netflow ICAM count : 0

Netflow Creation Failures : 0

Netflow CAM aliases : 0

in 2-3 hours I'll get it 100% full a hundrets thousend of "Netflow Creation Failures", and in peak time the table goes full in 2 seconds.

And now we've activated the Netflow statistic only for outgoing traffic, but we need the incoming as well, which doubles the size.

>Could you decrease active flow timeout to 1- 2 minute? I recommend set the inactive

>timeout between 8-16 sec. Do you using "mls netflow usage notify" to logging heavy loaded cache?

if you mean the command "ip flow-cache timeout active 1" then I have this command already but it works only for MFSC traffic which is not important for us.

I used to use "mls netflow usage notify" but it become too chatty and overfilled or logging :), I know without that command that our Netflow table is overfilled.

Jan Nejman Thu, 04/17/2008 - 23:39

Hello,

it is interesting, because sup730B is able to store only 128k entries. Sup720BXL is able to store 256k entries... for complete table see:

http://support.caligare.com/kb/entry/61/

Do you account bridged VLANs (ip flow ingress layer2-switched vlan)?

PS.: did you configure mls aging? Please, send me output of "show mls netflow aging".

Jan

Konstantin Dunaev Thu, 04/17/2008 - 23:57

we don't use L2 accounting.

edge1#sh mls netflow aging

Netflow aging is disabled for sampled netflow

as we're using "sampled" export the aging doesn't play any role. If disable the "sampled" export then we can use aging, but the smallest aging time is 32 seconds which is too much, as the netflow tables goes full in 2-3 seconds.

Jan Nejman Fri, 04/18/2008 - 04:32

Hello,

do you know what kind of traffic causes full cache? Is it many NATed connections or some kind of flooding? Sup720B is able to store about 117k entries, so on your network must be more 40k unique connections in a second. How many customers are you connecting? Can you borrow sup720-3BXL? But BXL has capacity only for 230k entries.

Jan

Konstantin Dunaev Fri, 04/18/2008 - 04:42

as I said we have ~500K concurrent connections, they are mostly HTTP traffic.

No NAT.

edited: and ~40-50K session per second

it doesn't really metter how fast (connections/sec) they are comming, because normally a single HTTP session is quite long in time, may be 15-30 seconds: user opens a page, then starts to navigate on it, then may be close it or go to another page. if the session was not closed by the user it keeps open until Timeout (Webserver or TCP) is taking place.

Jan Nejman Fri, 04/18/2008 - 05:13

;-( It is bad. I think that you are not able to monitor it on the cisco. Maybe CRS-1, but it is too expensive and I'm not sure if CRS-1 has a bigger cache. Maybe I have another solution for you, try visit web pages: http://www.invea-tech.com/ This company develops a HW based (FPGA chips) transparent NetFlow probes, so you will be able to see all traffic. It can be fully transparent solution (when you will use optical T spliters). I tried their probe with 1GE ports, but they offer also 10GE cards...

Jan

Actions

This Discussion