This is going to be a little detailed, please bear with me.
We are having a problem with workstation connectivity, primarily stations we are rebuilding. The rebuild process involves a PXE boot, which then pulls down a Windows Vista mini system (Microsoft SCCM - Lite Touch) - then the rebuild of the system would occur from there. It had been working for quite some time and then stopped in mid-December.
System boots and PXE completes, tftp download runs and vista mini-system loaded. It seemingly has no network connectivity at that point (no ping in or out), but can do an ipconfig /release and ipconfig /renew successfully.
Subsequent reboots and PXE and tftp fail.
We went 'round and 'round with the application side, testing tftp, dhcp, etc. and they all seem to be working fine.
Today I took a look at the switching side and found something very unusual.
In the above scenario when the system boots for the first time and PXE completes successfully, everything looks good. Once the vista mini-system kicks in, the switch suddenly reports flaps between the port the system is plugged into and the trunk line feeding the switch. All communications stop at that point.
1. sh mac addr addr xxxx.xxxx.xxxx shows the mac on the trunk port (as dynamic)
2. clear mac addr dyn addr xxxx.xxxx.xxxx will NOT clear the mac
3. The mac does not seem to time out (it has been in the table for two days now)
4. The next-hop switch does not show the mac at all with sh mac addr addr xxxx.xxxx.xxxx (I thought maybe a spoofed ip)
Essentially, this has made this workstation unusable (and a number of others) and I am at a loss to explain what is going on. The symptoms have affected a few standalone Vista systems as well as a few 2008 and a single linux server - ipv6 issues? (we aren't running ipv6, but that seems to be the common denominator amongst the systems with problems).
Any suggestions, pointers, or voodoo priestesses gratefully accepted.
It sounds to me like you have a layer 2 loop. Either, the switch connecting to the workstation is multihomed, or the workstation has multiple ways of communication, ie wireless and LAN.
The only way a switch can populate its bridge table is by observing the source mac traffic, which tells me that somehow you have a loop. I doubt you have a duplicate mac issue.
I have had this issue in the past and it was discovered that 2 switches on different sides of the building had "rogue" wireless APs attached by "genius" employess. The Linksys type APs bridge the VLAN and caused the exact flap issue you are speaking of.
Also, is your switch using an etherchannel andor multihomed to the core?
Spanning tree isn't spitting back at me and the workstation is quite happy to work wirelessly. The problem only occurs on the wired side.
The switch has a single gig link to the core and is not multihomed.
I would expect that if I had a loop (which was my first thought) that the next-hop switch would also show me the mac address (it doesn't)
The problem seems to only occur with ipv6 enabled (Vista/Linux/2008) machines, but that could be a red herring.
Once the station is 'broke', no matter where I plug it in on the network, it will generate the mac flaps between the local port and the trunk port.
Any log messages?
Is this a single switch or a switch stack of 3750s or 3560 with gigastacks?
If its a 3750 stack could you paste the output of:
sh switch stack-ports
Do any ports go to error disable?
Do all the access ports facing the workstations have portfast on?
For the VLAN where the problem occurs, are there lots of topology change notifications (TCN)?
show spanning-tree vlan 100 detail
More than 1 or 2 TCNs will indicate a bridged network. Especially if the TCNs are incrementing consistently.
You said the workstations can use wireless connections also. Is the wireless subnet in the same subnet as the wired or in the same VLAN?
Some wireless application can borrow the MAC of the wired port.
The log messages are:
Jan 8 08:39:50: %SW_MATM-4-MACFLAP_NOTIF: Host 0012.3fda.5201 in vlan 18 is flapping between port Gi0/1 and port Gi0/10
Gi0/10 is a trunk to a 3560G (copper)
Gi0/1 is the Vista workstation
Nothing has err-disabled, but I cannot get rid of the MAC(s) on the trunk port - and I cannot fathom where else they would come from.
All workstation ports have portfast on.
Number of topology changes 1 last change occurred 06:02:30 ago
Times: hold 1, topology change 35, notification 2
hello 2, max age 20, forward delay 15
Timers: hello 0, topology change 0, notification 0, aging 300
Wireless network is completely isolated (separate VLAN etc. etc.)
Thanks for the input, it is appreciated.
The only thing I have found regarding the error code %SW_MATM-4-MACFLAP_NOTIF and PXE is that this can be caused by DHCP snooping with Option 82 enabled (which is default if u enable DHCP snooping).
Basically the problem sounds like your issue. The PC boots once, gets an initial image, reboots and fails.
Is DHCP snooping enabled?
I added a second switch level and got some different results:
Switch1 - New Switch
Switch2 - Switch workstations were originally plugged into
Switch3 - Backbone
Switch1 <--trunk--> Switch2 <--trunk--> Switch3
I expected that when I removed the workstations from switch2 and plugged them into switch1 I would see that the trunk port would flap with the workstation port.
That didn't happen. Instead the two trunks on switch2 are flapping. The ports on switch1 look normal (ie. sh mac addr addr [workstation mac] shows it on the proper port). It almost seems as if the MAC table gets permanent entries for these addresses that cannot be cleared.
Something interesting is happening with Switch 2. First, I would check to see if there is a static MAC in the config on switch 2. If nothing there, I would remove switch 2 if this is a lab type environment. If removing switch 2 resolves the issue, then we would need to look at the config of switch2.
The switches are acting like spanning tree is nt running on switch 3 or there is a loop between switch 2 and switch 3. I have seen behavior if there is a switch stack with bad backplane uplink cables, etc. It seems that switch1 is sending to switch 2 then to 3, but switch 3 is immediately passing the packet back to switch 2 on the same port.
I know this is an old thread but wanted to see if you ever resolved your issue. I have a similar issue but I am also seeing duplicate multicast packets on the same trunk. I noticed that after I moved to a different sfp/port on the same switch, the issue was resolved. Possibly a bad sfp/X2 module.
From the Cisco Error Message Decoder:
%SW_MATM-4-MACFLAP_NOTIF: Host [enet] in [chars] [dec] is flapping between port [chars] and port [chars]
The switch found the traffic from the specified host flapping between the specified ports. [enet] is the host MAC address, [chars] [dec] is the switch ID, the first and second [chars] are the ports between which the host traffic is flapping.
Recommended Action: Check the network switches for misconfigurations that might cause a data-forwarding loop.