I'll keep this as short as possible. I have a couple hundred 1131AG APs deployed with WiSM controllers. Controllers and APs are running 18.104.22.168. The environment has been stable for over a year. We recently had to make some changes which required us to redeploy some of the APs. The APs are located in ceiling tile enclosures. The APs are powered via PoE from 3750s.
When I moved the first group of APs, several of them failed to come back online in the new locations. The switchports were configured identical and every other aspect was the same except the cable from the AP to the switch and the switch was different, although still a 3750 with the same port config. We ruled out the cable and the switchport after testing. After finding APs that worked correctly, I took the bad APs to a known good drop where we have a temp AP installed. I used the drop to test the failed APs. On every AP I tested, with the assistance of TAC, the problem was the same "Ethernet port failed to initialize". Even after removing the config and restoring the AP to defaults, the ethernet port still would not come up. I left them alone for a while and completed a couple other projects I had and came back to the troubleshooting today (about 3 weeks after the failures). During the 3 week gap, 7 other APs failed with the same symptoms. TAC diagnosed the issue as a hardware failure (Ethernet Port).
Today I took each AP and powered them up with the same drop I used to test before. Every AP came online, associated to a controller and was working just fine. So I took the AP from the test site, back to the proper location and installed it. It failed again. When checking the CLI, the AP goes into LWAPP discovery. It shows the LWAPP Broadcast to 255.255.255.255, then the Ethernet port goes down, then up, then down, then up, continuously. The LED on the face of the AP cycles orange, green, and red repeatedly. I took the AP to the switch and cabled it to the switch, bypassing the drop and used a different port with the same results. I immediately took the AP downstairs, connected it back to the test drop, consoled in, and watched as it booted, pulled an IP, and immediately associated to the controller. This procedure was followed for every failed AP, all with the same results. Has anyone encountered anything like this?
I was saving the firmware upgrade as a last resort. :) But it's looking like I am going to be left with no other choices. I have a lunch meeting tomorrow with a Cisco engineer and will pick his brain as much as I can. I will post here if I get some good info out of the meeting. If not, code upgrade here I come.
Transferring Crash file from standby: Login to the Active WLC in HA.
From CLI: (Cisco Controller) >transfer upload datatype crash (Cisco
Controller) >transfer upload filename (Cisco
Controller) >transfer upload mode tftp (Cisco Controller) >transfer
This is the start of a display filter cross reference between Wireshark
and OmniPeek. The 1st installment is a table of advanced filters. More
filters will be added as time allows. It is a living doc, so check back
for changes every so often Please feel f...