We have been experience some very strange issues on our network.
Switches all of a sudden started dropping off the management Vlan. On closer inspection on the switch logging buffers we saw the following message:
%PM-4-ERR_DISABLE: loopback error detected on Gi0/1, putting Gi0/1 in err-disable state
I know that is generated by the switch when it notices a loop on its keepalive messages generate by the Ethernet Configuration Test Protocol, but what I do not understand is how it works.
We did eventually find the problem. Spanning tree for Vlan 1 had been disabled on 2 distribution switches which must have caused a temporary loop on Vlan 1. We have since added the errdisable recovery cause loopback to all access switch configs.
To further add to my confusion of how the protocol works, I setup a fluke analyser on Vlan 1, started a capture, and rebooted a random switch that was trunking Vlan 1. Up until the reload of the switch I saw nothing on the Vlan apart from CDP and STP traffic, as expected. However, once the switch had reloaded I saw approximately 50 loopback ethernet frames from a handful of switches across the campus.
It is as if the reloaded switch, by generating its own keepalive messages, caused other switches to do the same, but in apparent order and not all switches, just a random selection.
To make matters even more strange, all of these loopback addresses had the same source and destination MAC address for each loopback messages, so why was I seeing this on the capture that was being run on a totally different switch when the frames were not broadcast ?
It appears that the ECTP is generating a kind of broadcast on Vlan 1.
Does anyone have any idea how this protocol works ? Or any links to any documentation as I cannot find any.
Thanks in advance.
The protocol sends a keepalive every 10 seconds on each of the switched port. If the port receives back the keepalive it has sent, it is shut down with the message you saw.
A bridge flood traffic with unknown destination mac addresses. That's why this frame can be temporarily flooded in the network. As soon as the source mac address of the keepalive message has been learnt by a neighboring switch, the forwarding of the keepalive through the switch will stop. This is because a switch does not forward a frame whose source mac address has been learnt on the port on which it has been received. So in a stable state, the keepalives should be contrained to the ethernet segment on which they are transmitted and should not be flooded across the whole bridged domain. When STP advertises a topology change in the network, the CAM tables are flushed and this gives another opportunity to the keepalives to be flooded, until their addresses are learnt again. That's probably what is happening when you are rebooting the switch.