troubleshooting Dynamic Arp Inspection - port security excessive port shutdowns
I have about a dozen 3560X closet switches running IOS 15.2(1)E. These switches are configured for DHCP snooping, DAI and port-security since going into production roughly a year ago. Their predecessors (3560 switches) ran a similar configuration for several years.
Historically, we'd see a couple of ports go down most weeks. About 2 months ago this increased to 6 to 10 a day. About half the shutdowns happen overnight when nobody is working on the PCs. Most switchports have a Cisco 79xx phone and a PC attached (PC connects through the phone).
Ports log "%PM-4-ERR_DISABLE: arp-inspection error detected on Gi0/35, putting Gi0/35 in err-disable state" when they go down.
I am trying to figure out exactly what the packets are that are triggering these port shutdowns. Six months ago we had a rash of these in a particular department that seemed to be caused by a lot of dropbox LAN sync discovery packets, but that doesn't seem to be the case this time.
I enabled "smart logging" on the switches with:
ip arp inspection log-buffer entries 256 ip arp inspection log-buffer logs 25 interval 1 ip arp inspection smartlog
But it looks like netflow also needs to be configured before this will work, since no logs have been created despite multiple shutdowns since smartlog was configured.
What would the minimal netflow config for this look like on a 3560X? Do I need to configure every port for netflow? I don't have a netflow collector either. I'm wondering if I could just fake one with netcat listening for UDP packets on port 2055 and redirecting to /dev/nul... I'm assuming the smartlog buffer on the switch will also log the trigger packets. Is that what happens?
If I can't simulate a netflow collector with netcat, can anyone recommend a freeware collector that can be set up quickly?
I have just seen this, here are some possibilities
1. Did you make a check on the dhcp snooping binding database when you were getting arp inspection errors, since DAI depends on a complete dhcp snooping database it is worth checking that you have a complete database.
2. Did you configure the switches to trust the uplinks for DAI, as you would for dhcp snooping?
3. Finally - DAI will also trip if the rate of arp requests is too high, so it may be that you have a host or two sending DAI requests at too high a rate, this can be modified using the command
ip arp inspection limit rate [rate] under a receiving interface
with the numerical value being the rate per second,
The ports won't shutdown from DHCP snooping problems, they just block traffic from the bad IP/Mac address combos. DHCP snooping has been trouble free for five+ years now.
Logs indicate that DAI is tripping the port because of excessive ARP requests. I know I can up the limit but since this is a security feature, I need to know that the excessive ARP requests aren't coming from unknown/unauthorized processes. If I just up the limit without knowing that, I could be helping the bad guys.
I am not however convinced that DAI is correctly identifying the packets as ARP since the earlier problem seemed to be caused by other (UDP) broadcast traffic. I did manage to get a packet capture of a port while it went down overnight but there was no traffic on the port at all when it went down.
yep - the only reason I mentioned DHCP snooping in relation to this is because the dhcp snooping database is used by the DAI process, and a comparison is made between the ARP responses and the information stored in the DHCP snooping database, I just thought it may be worth looking at to make sure the DHCP snooping info was still available to DAI at the time you were having problems.
I take your point about the switch incorrectly identifying ARP traffic, but as in your last example, you are not seeing any traffic on a port which is shutting down due to a DAI error, then it may be time to raise a TAC case.
I managed to get a packet capture with Wireshark showing the traffic right up to the point that the port shut down. There are a bunch of ARP request packets just before the port goes to "ERROR DISABLED", but the requests are asking for the MAC address of devices on the port, not coming from the devices on the "about to be shutdown" port.
There are a few things weird about these ARP requests:
First - They are coming from MAC addresses that are not in the ARP tables of any routers in the network.
Second - The source macs belong to the uplink ports of all the other closet switches. Not devices connected to the uplink port but the port itself. These are not layer three interfaces, Why are they doing ARP requests?
Third - The ARP requests are addressed directly to the mac of the device on the about to go down port, indicating that the requesting device already knows the address that it's asking for.
Fourth - It looks like all my closet switches are sending these requests at about the same time. They all arrive at the target within about a second, and this repeats about every 28 seconds.
Time to hunt for bugs in c3560e-universalk9-mz.152-1.E.bin...
The ProblemEnter EVCsHow It Works (Ingress)How It Works
(Egress)Step-by-Step ExampleFinal Thoughts The ProblemOn traditional
switches whenever we have a trunk interface we use the VLAN tag to
demultiplex the VLANs. The switch needs to determine which MAC ...
The ProblemEnter EVCsHow It Works (Ingress)How It Works
(Egress)Step-by-Step ExampleFinal Thoughts Introduction: Netdr is a tool
available on a RSP720, Sup720 or Sup32 that allows one to capture
packets on the RP or SP inband. The netdr command can be use...
IntroductionOSPF, being a link-state protocol, allows for every router
in the network to know of every link and OSPF speaker in the entire
network. From this picture each router independently runs the Shortest
Path First (SPF) algorithm to determine the b...