cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3909
Views
0
Helpful
2
Replies

4500 switch high CPU - unicast ARP flooding

cco4nsdnet
Level 1
Level 1

Hi all,

I have got a problem recently on a LAN where the 4500 switch acts as the L3 switches, a blade center server had a network driver card bug that flooded the 4500 L3 switch and so CPU was reaching 100% stopping th whole production.

sup card is the following : 1     2  Supervisor IV 1000BaseX (GBIC)         WS-X4515           JAE0935K45J

chassis cisco WS-C4506 (MPC8245) processor (revision 7) with 524288K bytes of memory.

During the outage, the commands show platform health and show platform cpu packet statistics could be passed and I got the following, you can notice the large number of L2 Fwd Low.

The problem has been stopped identifying which server was flooding ( doing interface vlan shut tries, then watching servers on the vlan identified) and then stopping the identified server.

Now, the big point for me is to be able to limit a such problem in the future like limiting the CPU utilisation in case any server on the LAN "arp flood" the coreswitch for any reason. I heard about the DAI, but seems to be used with the DHCP snooping and build a table in the conf, but I have too much servers in this network (hundreds). I will also set the storm-control functionality, but the 4500 here are old and the unicast limitation is not existing in, only the broadcast (and multicast in the last version of IOS).

Someone told me about the MLS rate-limit but I don't know this functionality.

Can someone give me some guidances about a command that would prevent the core switch reaching 100% limiting the arp requests for example, this is what I need, or another good idea. I thank you for the time you will take to read this post.

fs159d_1339490847982.jpg

show platform cpu packet statistics
Packets Dropped In Hardware By CPU Subport (txQueueNotAvail)

CPU Subport TxQueue 0 TxQueue 1 TxQueue 2 TxQueue 3
------------ --------------- --------------- --------------- ---------------
0 0 0 0 281812
2 0 311220 0 0
7 0 0 0 682198047


RkiosSysPacketMan:
Packet allocation falures: 0
Packet Buffer(Software Common) allocation falures: 0
Packet Buffer(Software ESMP) allocation falures: 0
Packet Buffer(Software EOBC) allocation falures: 0
IOS Packet Buffer Wrapper allocation falures: 0

Packets Dropped In Processing Overall

Total 5 sec avg 1 min avg 5 min avg 1 hour avg
-------------------- --------- --------- --------- ----------
64 0 0 0 0

Packets Dropped In Processing by CPU event

Event Total 5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Input Acl 25 0 0 0 0
SA Miss 15 0 0 0 0

Packets Dropped In Processing by Priority

Priority Total 5 sec avg 1 min avg 5 min avg 1 hour avg
----------------- -------------------- --------- --------- --------- ----------
Normal 24 0 0 0 0
Medium 39 0 0 0 0
High 25 0 0 0 0

Packets Dropped In Processing by Reason

Reason Total 5 sec avg 1 min avg 5 min avg 1 hour avg
------------------ -------------------- --------- --------- --------- ----------
SrcAddrTableFilt 2 0 0 0 0
L2DstDrop 13 0 0 0 0
NoDstPorts 24 0 0 0 0
NoFloodPorts 25 0 0 0 0

Total packet queues 16

Packets Received by Packet Queue

Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Esmp 387108 200 191 157 52
Control 3585 0 0 0 0
Host Learning 36405 0 0 0 0
L3 Fwd High 12462 4 2 0 0
L3 Fwd Medium 765 0 0 0 0
L3 Fwd Low 91200 56 33 27 10
L2 Fwd Low 13773746 7809 7870 6478 2012
L3 Rx Low 10409 3 2 0 0
ACL fwd(snooping) 131185 68 60 48 16
ACL sw processing 1 0 0 0 0

Packets Dropped by Packet Queue

Queue Total 5 sec avg 1 min avg 5 min avg 1 hour avg
---------------------- --------------- --------- --------- --------- ----------
Host Learning 210028 0 0 0 25
L2 Fwd Low 620345446 359549 355739 289024 90541

      

gs229n_1339416684840.jpg

2 Replies 2

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello,

DAI might be of help, according to documentation ARP ACLs can be used in non DHCP environments like yours.

An ARP ACL per SVI is needed listing all the permittted IP/MAC pairs of servers in that IP subnet.

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/54sg/configuration/guide/dynarp.html#wp1056092

With hundreds of servers it looks like a long job, but it should be feasible with one ACL per IP subnet.

With default settings each untrusted port is limited to 15 ARP packets /sec

Besides this, the configuration guide reports that enabling DAI increases the cpu usage see this note.

>>"When  you enable DAI, all ARP packets are forwarded by CPU (software  forwarding, the slow path). With this mechanism, whenever a packet exits  through multiple ports, the CPU must create as many copies of the  packet as there are egress ports. The number of egress ports is a  multiplying factor for the CPU. When QoS policing is applied on egress  packets that were forwarded by CPU, QoS must be applied in the CPU as  well.  (You cannot apply QoS in hardware on CPU generated packets  because the hardware forwarding path is turned off for CPU generated  packets.)  Both factors can drive the CPU to a very high utilization  level."

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/54sg/configuration/guide/dynarp.html#wp1055749

Before attempting this big job I would open a TAC service request to ask if DAI can be a tool to be used in your environment or is not a viable option

There is a chapter about CoPP (Control Plane Policing) but I don't see ARP mentioned in the chapter

see

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/54sg/configuration/guide/cntl_pln.html

>> ARP policing is not supported on either the  classic series supervisor engines or fixed configuration switches. It is  supported on the Catalyst 4900M and 4948E switches, Supervisor Engine  6-E, and Supervisor Engine 6L-E.

http://www.cisco.com/en/US/docs/switches/lan/catalyst4500/12.2/54sg/configuration/guide/cntl_pln.html#wp1179750

So no CoPP for ARP you have a sup IV

Hope to help

Giuseppe

Giuseppe, I would like to thank you for your response firstly. Thats kind taking care about my issue.

Did you had the opportunity to implement DAI on some networks and if yes, what is the real degree of complexity of implementation maybe non wanted behavior.

This customer runs a lot of blade centers connected to the distrib cisco switches, and also some ESX running a lot of virtual images, the risk could be to set a bad threshold for the broadcast packets counts and limit the traffic to normal traffic as those devices run between 12 and 30 images.

If I understand the implementation, a generic ACL per subnet matching traffic could be implemented and after, a per port DAI implementation has to be configured with a specific threshold. Am I correct?

The main cause of the initial issue I had was a unicast flood between the server and the core switch gateway, do you think the DAI can provide protection against broadcast but also unicast floods?

A ticket is currently being opened by the support to the Cisco TAC to discuss about this and get the recommandations also.

Have a nice day Giuseppe

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Review Cisco Networking products for a $25 gift card