VLAN leaking and ARP flood

Unanswered Question
Apr 1st, 2009
User Badges:

Hi

I have a network with two 6509s, 720-3BXL in the core of it. The 6509s are connected to each other over a trunk with all VLANs allowed over it. Down the line there are top of the rack switches, each connected to both 6509s over trunks. Only VLANs configured on the top of the rack switches are allowed down this trunks, plus VLAN 101 (IP range 172.21.1.0/24), which is the management VLAN for all the networking equipment and some of the infrastructure servers (monitors, syslog ones etc). I have noticed recently that ARP requests destined to other VLANs get sent down VLAN 101. Here is a part of output from one of the switches:


Apr 1 10:16:21.428 gmt: IP ARP req filtered src 80.68.48.2 001b.0dec.59c0, dst 80.68.48.12 0000.0000.0000 wrong cable, interface Vlan101

Apr 1 10:16:21.436 gmt: IP ARP req filtered src 80.68.48.2 001b.0dec.59c0, dst 80.68.48.12 0000.0000.0000 wrong cable, interface Vlan101

Apr 1 10:16:25.328 gmt: IP ARP req filtered src 80.68.48.3 001b.0dec.5ac0, dst 80.68.48.25 0000.0000.0000 wrong cable, interface Vlan101

Apr 1 10:16:25.328 gmt: IP ARP req filtered src 80.68.48.3 001b.0dec.5ac0, dst 80.68.48.25 0000.0000.0000 wrong cable, interface Vlan101


Where 80.68.43.2 and 80.68.43.3 are IPs of the core 6509s in VLAN 303.


There are also HSRP groups (with different IPs) for each VLAN configured on the 6509s.


I will be happy to provide more information, including diagrams, if needed.


Please help! :) as my brain is going into melt down already.


Regards

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Wed, 04/01/2009 - 02:40
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

someone has joined the two broadcast domains vlan 101 and vla 303.


This can be caused by:

two access-ports one in vlan 101 and one in vlan 303 connected together


or a mismatch in native vlan-id on an 802.1Q trunk port.


or a device of vlan303 moved to a switch port in vlan 101.



Can you reach ip address 80.68.48.3 ?

use the mac-address to find out the port(s) where the mac is seen.


Hope to help

Giuseppe



VictorAKur Wed, 04/01/2009 - 02:48
User Badges:

Its not just VLAN 303 though :( It seems ARP traffic to different VLANs gets trough to 101 too some times. Also if it makes any difference (I didn't think it did) - 6509s share the same MAC for each VLAN.

VictorAKur Wed, 04/01/2009 - 06:24
User Badges:

A bit of an update on this one.


One (and only one) of my core switches seems to be generating (or passing through?) frames with 0000.0000.0000 MAC addresses. These very frames then are seen in the wrong VLAN on all other switches.


SW1 -Apr 1 14:17:13.988 gmt: IP ARP: sent req src 80.68.43.2 001b.0dec.59c0, dst 80.68.43.81 0000.0000.0000 Vlan303

Apr 1 14:17:13.992 gmt: IP ARP: sent req src 80.68.48.34 001b.0dec.59c0,

dst 80.68.48.48 0000.0000.0000 Vlan202


SW2 -Apr 1 14:17:16.556 gmt: IP ARP req filtered src 80.68.43.2 001b.0dec.59c0, dst 80.68.43.81 0000.0000.0000 wrong cable, interface Vlan101

Apr 1 14:17:16.556 gmt: IP ARP req filtered src 80.68.48.2 001b.0dec.59c0, dst 80.68.48.26 0000.0000.0000 wrong cable, interface Vlan101


Plus I get a lot of unicats ARP flooding in the VLAN 303 - I have set CAM and ARP timers to the same value.


Any idea?



Giuseppe Larosa Wed, 04/01/2009 - 08:55
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

the issue is more complex then what I thought of at the beginning.


A basic note: an ARP request should be sent out with a broadcast address of ffff.ffff.ffff I'm concerned with these frames having an all zeroes destination MAC address.


I would do the following:

I would check if the source ip addresses/ source mac addresses are legitimate users of the network.


For example I notice:


all frames have the same source MAC:


src 80.68.43.2 001b.0dec.59c0

src 80.68.48.2 001b.0dec.59c0


and OUI 001b0d is Cisco systems.


probably it is a MAC address in your core switch and it is misbehaving.


one of the two devices is not working correctly.


Hope to help

Giuseppe



VictorAKur Thu, 04/02/2009 - 02:12
User Badges:

Giuseppe


80.68.43.2 and 80.68.48.2 are the IP addresses on one of two core switches, they belong to interfaces VLANs 303 and 301 respectively.

Each of those VLANs has an HSRP group configured (with different IDs) on it as well with IPs of 80.68.43.1 and 80.68.48.1


The IPs are legitimate.


I am quite sure that I do not have any miss-configured access ports, or improperly formed dot1q trunks.


The other problem is that the core switch in question seems to flood ARP requests to this IP ranges regularly. I have already changed MAC and CAM tables time outs to 14400, but it did not help. So at the moment every device that has access to the management VLAN 101 gets hit by this traffic.

Giuseppe Larosa Thu, 04/02/2009 - 02:37
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

1 VIP, 2 and 3 on the core switches is really classic we do the same.


I would examine L2 topology with


sh spanning-tree vlan 101


sh spanning-tree vlan 303

sh spanning-tree vlan 301


looking for any strange info.


if this confirms no problem at layer 2 I would consider:

reloading the misbehaving switch

looking for ARP related bugs that can apply to your device.


if also a reload doesn't fix and you don't find known bugs that match what you see I would open a TAC service request.


hint:

have you enabled or disabled ip proxy-arp on the involved SVIs ?


you use different HSRP ids : how many groups are you using ?


C6500 should be fine also with one hundred of different groups but this is not true for less powerful platforms.


Hope to help

Giuseppe



VictorAKur Thu, 04/02/2009 - 03:34
User Badges:

Giuseppe


Thank you.

I will go through L2 again in case I did miss something.


no ip proxy-arp is set on all routed interfaces.

There are currently 7 HSRP groups, so it is way below the limit. In any case I would expect the 6509E with a Sup7203BXL and 1Gb of RAM to be able to cope with it.


Any idea why it would generate so many ARP retransmissions in a particular IP range?


Regards,

Giuseppe Larosa Thu, 04/02/2009 - 04:05
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,


>> Any idea why it would generate so many ARP retransmissions in a particular IP range?


No, it looks like wrong.


another aspect to verify is the service modules that are on the chassis.


We had a problem with a CSM pair that under heavy traffic load conditions was leaking random mac addresses causing a mac attack to access-layer switches.

We opened a case it was a bug and we solved with a CSM software upgrade.


So the problem can be also in a service module if you have any FWSM, ACE or CSM or others.


Hope to help

Giuseppe


VictorAKur Thu, 04/02/2009 - 04:11
User Badges:

we have ACE appliances connected to both core switches, but I have shut the interfaces to them down for now.


I have noticed that my 6509s share one MAC address among all the VLANs. While it is normal, as far as I understand, do you think it may confuse the switches downstream? I have Cat3560s 24G-TS there.

Giuseppe Larosa Thu, 04/02/2009 - 04:33
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,


the usage of the same MAC address on all vlans is not a problem for other devices.


the messages are sourced by ip addresses of one switch as we noted in previous posts.


I guess you have external stand-alone ACE appliances.


Hope to help

Giuseppe


lamav Thu, 04/02/2009 - 04:42
User Badges:
  • Blue, 1500 points or more

Leaky vlans?


I hate when that happens.


You use ACE?


Great, ACE Hardware makes a great bucket. Rubberized and rugged to hold all that yukiness from the leaking vlans.

VictorAKur Thu, 04/02/2009 - 04:53
User Badges:

Iamav


That sounds helpful, but I can't quite figure out what to do with your advice :)


I have disconnected both ACE 4710s by the way, so they are not on the network at the moment.


Regards,

Giuseppe Larosa Fri, 04/03/2009 - 09:52
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,


>> Can HSRP be responsible for cross-vlan talk?


No, it should stay confined in the origin vlan.

HSRP messages are sent as UDP on multicast destination 224.0.0.2 all routers on subnet with TTL 1.

So a correct implementation shouldn't be able to cross-talk between different Vlans.


One thing you could try is to open the network leaving only one core router connected to access layer switches and to see if anything changes


Hope to help

Giuseppe



Hope to help

Giuseppe


VictorAKur Tue, 04/07/2009 - 11:35
User Badges:

Giuseppe


and any one who have had a look at it so far,


an answer to this question may or may not cast some light at the problem.


Why a 6509 with Sup7203BXL and IOS 12.2(33)SXH would ARP broadcast regulary, particulary on two VLANs? The broadcast looks like a typical port scan - as in every IP address in the range gets ARPed. Even though there are only 6 active IPs in this /24 network, every other IP gets ARPed very often. I have set mac-address-table aging to 14400 globaly.

But it did not help.


Any idea?

Giuseppe Larosa Tue, 04/07/2009 - 11:55
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

the ARP timeout is a different timer, you have increased the timer of the L2 CAM table.


to check your ARP timer settings you can use


sh ip int vlan XX


and look for the ARP timeout line


default value is 4 hours.


You have told you have isolated the ACE appliances so there shouldn't be any paranoid device making controls ( I don't want to mean that ACE should do this)


using a sniffer on a pc on one of these vlans you can check the source MAC address of these frequent ARP requests.

For example you can use wireshark.


(probably you have already done this)


Hope to help

Giuseppe



VictorAKur Tue, 04/07/2009 - 12:53
User Badges:

Yep


done wireshark on all VLANs.


It seems to be the two core switches that generate all this traffic.


I should probably say again that it is the management VLAN that gets hit by traffic destined to other VLANs, not any other VLAns.

We have VM Ware system where interfaces on the ESX are connected to both management VLAN and the data VLAN. And we have a number of Linux based systems, that use VLAN tagging. I have got to the point where I am considering to disconnect racks from the core in order to isolate the issue.



VictorAKur Tue, 04/07/2009 - 12:58
User Badges:

Well I am assuming the ARP timeout is set to 4 hours as there is nothing about ARP timout in the sh ip int VLAN blah blah I can see.


sh ip int vlan 303

Vlan303 is up, line protocol is up

Internet address is x.x.x.x/24

Broadcast address is 255.255.255.255

Address determined by setup command

MTU is 1500 bytes

Helper address is not set

Directed broadcast forwarding is disabled

Multicast reserved groups joined: 224.0.0.2 224.0.0.5 224.0.0.6

Outgoing access list is not set

Inbound access list is not set

Proxy ARP is disabled

Local Proxy ARP is disabled

Security level is default

Split horizon is enabled

ICMP redirects are always sent

ICMP unreachables are always sent

ICMP mask replies are never sent

IP fast switching is enabled

IP Flow switching is disabled

IP CEF switching is enabled

IP CEF switching turbo vector

IP Null turbo vector

IP multicast fast switching is enabled

IP multicast distributed fast switching is disabled

IP route-cache flags are Fast, CEF

Router Discovery is disabled

IP output packet accounting is disabled

IP access violation accounting is disabled

TCP/IP header compression is disabled

RTP/IP header compression is disabled

Probe proxy name replies are disabled

Policy routing is disabled

Network address translation is disabled

BGP Policy Mapping is disabled

Output features: IP Post Routing Processing, HW Shortcut Installation

Post encapsulation features: MTU Processing, IP Protocol Output Counter, IP Sendself Check, HW Shortcut Installation

Sampled Netflow is disabled

IP Routed Flow creation is disabled in netflow table

IP Bridged Flow creation is disabled in netflow table

WCCP Redirect outbound is disabled

WCCP Redirect inbound is disabled

WCCP Redirect exclude is disabled

IP multicast multilayer switching is disabled

Giuseppe Larosa Tue, 04/07/2009 - 22:24
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

sorry I was out of office the command to see the ARP timeout settings is:


sh int vlan xx


see


sh int vlan100 | inc ARP

Encapsulation ARPA, loopback not set

ARP type: ARPA, ARP Timeout 04:00:00


About the presence of Vmware ESX systems and linux boxes able to process vlan tagged frames they could play a role here.


One suggestion is to isolate one core switch to see if the problem is caused by the fact of being both on the same vlans.

If the abnormal ARP activity stops the problem is between the two core switches.


if the remaining switch still sends out a lot of ARP requests the problem can be in some Vmware or linux boxes


Hope to help

Giuseppe



VictorAKur Wed, 04/08/2009 - 02:27
User Badges:

Giuseppe


it is 4 hours everywhere.


Still what can cause a 65009 to do very regular ARP broadcasts?


I am going to shut one of the boxes down to try what you suggested. I will have to do it next week though, so may not have any updates till then.


Thank you for your help.

VictorAKur Thu, 04/09/2009 - 02:36
User Badges:

Another question


Could anyone explain please where the leak may be happening if the core switch sends traffic out of one VLAN and I see it hitting an access switch on another VLAN. There is one cable between the core switch and the access switch and only this two VLANs are allowed across the trunk on both ends.


Clutching straws here....

VictorAKur Thu, 04/09/2009 - 06:27
User Badges:

We have found a DRAC port that was causing a lot (every 20 - 30 seconds) STP resets. As we are running RSTP it was causing immediate ARP flush and resulting unicast floods. That is the theory anyway.


In practice it did not resolve the original problem, but did reduce the number of VLAN leaks per a time slot.


I still get this on the core switches though, even though both ARP and CAM timeouts are the same and set to 14400. Any idea what can cause it?


Apr 9 15:21:01.225 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.4 Vlan303

Apr 9 15:21:01.229 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.6 Vlan303

Apr 9 15:21:01.229 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.7 Vlan303

Apr 9 15:21:01.233 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.9 Vlan303

Apr 9 15:21:01.237 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.13 Vlan303

Apr 9 15:21:01.257 gmt: IP ARP: rcvd req src 80.68.62.2 001b.0dec.59c0, dst 80.68.62.19 Vlan201

Apr 9 15:21:01.269 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.22 Vlan303

Apr 9 15:21:01.269 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.23 Vlan303

Apr 9 15:21:01.273 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.24 Vlan303

Apr 9 15:21:01.277 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.25 Vlan303

Apr 9 15:21:01.277 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.28 Vlan303

Apr 9 15:21:01.277 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.26 Vlan303

Apr 9 15:21:01.281 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.30 Vlan303

Apr 9 15:21:01.281 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.29 Vlan303

sw2.ahf#

Apr 9 15:21:01.281 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.27 Vlan303

Apr 9 15:21:01.321 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.31 Vlan303

Apr 9 15:21:01.325 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.32 Vlan303

Apr 9 15:21:01.329 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.33 Vlan303

Apr 9 15:21:02.029 gmt: IP ARP: sent req src 80.68.43.3 001b.0dec.5ac0,

dst 80.68.43.81 0000.0000.0000 Vlan303

Apr 9 15:21:02.217 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.34 Vlan303

Apr 9 15:21:02.229 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.35 Vlan303

Apr 9 15:21:02.237 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.36 Vlan303

Apr 9 15:21:02.237 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.37 Vlan303

Apr 9 15:21:02.241 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.38 Vlan303

Apr 9 15:21:02.245 gmt: IP ARP: rcvd req src 80.68.43.2 001b.0dec.60c0, dst 80.68.43.39 Vlan303


it seems it is always the same VLAN that gets hit.

ROBERT THOMSON Sat, 07/18/2009 - 04:49
User Badges:

I have experienced the same problem of broadcast traffic and some unicast traffic leaking across vlans on 3750 switches when the ports are configured with a voice vlan for voip.



Actions

This Discussion