Before you run into the problem that we ran into when one of our larger customers upgraded from 4.2 to 5.0 and the result was that the controllers got caught in a loop during the upgrade and took the entire WLAN down.
Please be advised that Cisco recommends that you ISOLATE all lightweight access points from their controller during the upgrade of the controller from 4.2 to 5.0.
In part, this is due to a bug in 4.2.xx that starts upgrading the LWAP software images before the controller finishes updating/rebooting. This can cause interruption to the LWAPs as they upgrade their software from the controller when it reboots.
I have also heard of some situations where the LWAPs get corrupted to such a degree where you may have to out and physically access/reset the LWAPs that were interrupted during the upgrade process. I believe the worst-possible-case for this would be limited to no more than 10 LWAPs per controller
that can upgrade at a time)
In our case, the result was that the upgrade process hung up - with some LWAPs stuck, and all other LWAPs unable to pass traffic through the upgraded (ver 5.0) controller because they were at the previous revision (4.2).
In our case, this resulted in a system-twide WLAN outage.
As part of the call into Cisco TAC, the engineer said, in effect, 'why didn't you ignore the release notes instructions and isolate your controller from the LWAPs during the upgrade?'.
Our response was, we would have done so if it had been in the release notes (which it is not). So, after much cajoling by us, TAC finally agreed that it might be a good idea to include this additional step in the release notes.
Please reference the bug which requests an update to the 5.0 release note installation instructions: CSCso02420
TAC is aware of this problem and it is my understanding that there is discussion at Cisco regarding a maintenance release for 4.2 (since 4.2 is where the problem actually lies). The maintenance release would be applied to 4.2 controllers PRIOR to upgrading them to 5.0.
In the meantime, if you are still anxious to upgrade to 5.0, one workaround is to isolate the LWAPs from the controller either by:
1) Physically disconnecting the controller from the network during the upgrade and reconnecting it after its upgrade/reboot is complete.
2) Temporarily disabling the VLAN that carries the LWAP traffic to/from the LWAPs/Controller and re-enabling the VLAN after the upgrade/reboot of the controller.
I hope that this is helpful for anyone out there who is about to upgrade to 5.0.
I hope your upgrade goes smoother than our customer's did.
(Please remember to rate helpful posts)
Thanks for this very helpful update :) Please post up a little note for us to give you the nice rating this deserves.
I second the notion that an update to this forum on your progress will be eagerly anticipated. I rate it a "5".
I had something similar happen last night, while upgrading the WCS and WLCs to 5.0, the 1030B access points would not register to the WLCs with the new code. Later I found out that the 1000 series APs are not supported with this new version, so I had to downgrade the WLC and the WCS back to 4.2. I'm not sure why Cisco decided that the 1030 APs are no longer supported on the new versions. We recently purchased 6 APs for a new building, along with 2 new WLC 4402s, 1 WCS, and 1 APLOC.
Just wanted to get this out of my chest. Thanks. Leo
Sorry to hear about your experience.
It sounds like Cisco may be trying to sunset those units sooner rather than later.
For what it's worth, I had a rare moment where I decided to live dangerously and went ahead and upgraded to 5.0.56 on the WCS and then upgrade to 188.8.131.52 and 184.108.40.206-ER (emergency release) on our 2106 demo WLAN.
In this particular instance (with only 4 LWAPs: 2 x 1131 (native LWAP from the factory) and 2 x 1231 (originally autonomous, upgraded to LWAP), I did not encounter any problems.
(However, I did make sure that all of the LWAPs had upgraded their code before applying the 2nd upgrade to the controller).
In the case of our customer mentioned at the outset of this conversation, they have hundreds of access points installed across their campus, so maybe that is a factor in the premature LWAP download bug. Also, they are using 4404-100 WLCs. So... I don't know if these other factors have anything to do with the reported problems or not.
But I thought that, in the interest of "truth in advertising", I would share our successful experience as well.
For reference, our 2106 was originally running 4.2.61 and the WCS was running 4.2.62 prior to the upgrades.
This is getting really painful -- is this what managing a Cisco enterprise WLAN has come to? Why do we have to suffer these kludges? Several upgrades a year or you have to do these ridiculous two-steps, test-it-yourself recommendations, and still no text-file configs or true shell-based OS. This product is a far far cry from the Cisco IOS software standard.
I am sorry for your hassles for sure. Note that this is documented in the Release Notes under "Access Point Changes", though it's not featured as prominently as it probably should be.
We announced EoL/EoS for the 1000 Series line back in September:
We're at a point where we're squeezing all of the functionality we can out of the hardware and we have to focus our resources on the future.
I think that we are suffering from a chronic case of too much "feature velocity" and too little testing.
This is for Jake -
While we are discussing kludges.... The wireless sessions at Networkers (WBU) recommended an upgrade strategy by having enough controller capacity in the mobility group to allow each AP to be moved one at a time. You upgrade the secondary controller, turn off AP fallback and change the primary/secondary controller entry in the AP. The AP switches controllers and upgrades itself. As long as there is overlapping RF coverage, the clients typically don't see an interruption. This in imperative in healthcare with 7921, Spectralink, Vocera, COWs, etc, and allows for a slow migration instead of the "Hail Mary upgradeâ 600 APs and hit every controller (3 WISMs and 5 4404) at 2am. The controlled migration worked well for us until now.
So we need to start heading to 5.0 for a bug fix and decided to hit 4.2.99 from 4.0.217 in the interim. Upgraded and everything is cool. We find a bug in the GUI, the WLAN session timer does not reflect the actual value, but that's OK, we can use CLI. Then we get calls that the guest anchor is down on the upgraded controller.
After setting up the guest anchor in the lab, we found the following:
4.0.217 can anchor to 4.1.185
4.0.217 could not anchor to 4.2 or 5.0
4.1.185 could not anchor to 4.2 or 5.0
No foreign controller with newer code than the anchor could build a data or control path either. What a design flaw. That is like saying OSPF will only peer between 12.3 to 12.3, not 12.4 or 12.2. After opening a TAC case, they confirmed that mobility is not officially supported between different code versions. Sounds like I was the only one that didn't know that. So much for the controlled upgrade. Looks like it's an all nighter and hope everything works properly.
I might add the 4.1 mobility guide Chapter 10 (Guest Wireless) does not specify same code between foreign and anchor controllers as a requirement that I can find.
So Jake - What are your thoughts on this?? Guest anchor has backed us into a corner.
FYI - I noticed these messages in the message log on a 220.127.116.11 controller, sourced from 4.2 controllers.
Fri Mar 14 17:14:36 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:36 2008 [ERROR] mm_listen.c 4157: Mobility packet IP-message IP mismatch: pkt: 172.20.48.68 msg: 18.104.22.168
Fri Mar 14 17:14:36 2008Fri Mar 14 17:14:33 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:33 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:32 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:32 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:32 2008 [ERROR] mm_listen.c 4157: Mobility packet IP-message IP mismatch: pkt: 172.20.48.6 msg: 22.214.171.124
Fri Mar 14 17:14:32 2008Fri Mar 14 17:14:29 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:29 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:26 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:26 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:23 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:23 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:22 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:22 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:19 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:19 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:16 2008 [ERROR] ethoip.c 305: ethoipSocketTask: ethoipRecvPkt returned error
Fri Mar 14 17:14:16 2008 [ERROR] ethoip.c 106: Eth-over-IP pkt too short; pkt size=82, expected min=66
Fri Mar 14 17:14:13 2008 [ERROR] mm_listen.c 4157: Mobility packet IP-message IP mismatch: pkt: 172.20.48.4 msg: 126.96.36.199
I'm told there is a bugid.
note from develpement:
"The DDTS for this is CSCsl50993. This fix is checked in Dcube MR1.0 only. We are looking at Franciscan, no confirmation from dev team yet."
Santa Maria! This is big news.
We are attempting to configure an anchor controller for guest access now - the 4.1 document is contradictory to say the least. If anyone is willing to share how they set theirs up (we intend to place the controller outside the firewall) I would be most grateful. I am getting different instructions such as Foreign controller and Anchor controller both point to management interface or maybe not. Do you need to trunk a port on the switch even if there is only going to be one vlan on the Anchor? If so what other trunks do they mean???
And I will make sure all devices are on the same code!
On the anchor controller, I have always made it as simple as possible, by using the management interface. So basically you will configure the anchor controller guest wlan (SSID) to map to the management IP. Since the controller ports default to 802.1q trunk and needs to be untagged, you would create a trunk and set the management vlan to native.
If you decide to create a seperate dynamic interface on the anchor controller, then you will still have to crate a 802.1q trunk with the management as the native vlan and allow the vlan you used for the dynamic interface to pass.
Have you looked at the Enterprise Mobility 4.1 Guide. They have a section for Guest WLC. Also take a look at this doc for wired guest.. how to configure the guest anchor controller.
For the anchor, we put Ethernet Port1 on the trusted network and Ethernet Port2 on the DMZ. Port1 is bound untagged to management and Port2 is bound to a dynamic interface untagged with a DMZ address.
For mobility groups, let's say you have a WISM (foreign controllers) with both controllers peering each other in a group "SiteA-mob". They would also both peer with the anchor controller "dmz-mob" and the anchor will peer with the WISMs. Then create the same SSID with identical settings on both the anchor and foreign controllers. The trick is under the foreign SSID, anchor to the DMZ, but under the DMZ SSID, anchor to itself. At that point your EoIP tunnel should be active. DHCP and Authentication is controlled by the DMZ controller.
One precaution I take on the foreign is to create a null dynamic interface with a unused tag and no IP address, and bind to the foreign SSID. Cisco recommends to use the management interface on the foreign, but in the lab under certain circumstances when the tunnel fails, I've seen guest clients gain access via the management interface. Not a good idea for guest clients to hop on your trusted net via a misconfig. With the null interface, they get nothing if the tunnel gets deleted or mis-configued.
Apparently this is also part of CSCsk76973, which we saw happen when pre-loading 4.0.217 code on a 188.8.131.52 controller to save some time as part of our journey to 4.2. We bypassed the reboot piece for an upcoming maintenance window, which is a documented safe practice. Nonetheless APs apparently downloaded code that wasn't the primary image in flash.