The scenario is as follows:
Initially the customer had a single 7510 Flex WLC deployed servicing 605 APs at 60 sites. A second 7510 Flex WLC was purchased direct from Cisco, but it was an HA (no AP license) version. When the customer discovered they could not do AP SSO across two geographically separate locations, they proceeded to deploy the new WLC at their DR site, with the assurance that even without AP licenses, the HA WLC would accept AP registrations as a backup WLC for up to 90 days.
Configured the primary WLC with the Global Configuration defining the new WLC as the Primary Backup WLC. Placed both WLCs into the same Mobility Group. Tested the failover by blocking traffic to the primary WLC, and the APs at one location failed over to the new WLC as expected.
However, they do not fail back to the Primary WLC when it comes back online.
Nothing was defined in the Primary, Secondary, and Tertiary fields on the individual APs, so as a test I defined the Primary and Secondary WLCs. Rebooting that AP, it rejoined the Secondary WLC, even though the Primary WLC is available and reachable (5246 isn’t blocked, etc.) –it had just been joined to it previously. I thought this may be a case of the AP choosing the least-loaded WLC in the Mobility Group, so I removed/decoupled the mobility group, placing the DR WLC in its own Mob. Group. Rebooting the AP again yielded the same results.
For some reason it is discounting the Primary WLC completely even though it knows it’s the WLC of choice through Option 43 on DHCP, and by defining it specifically as Primary under the AP’s High Availability tab.
I had the onsite resource do a “Clear Config” then reboot of an AP to see if the behavior corrects itself, and capture the console output of the AP during the reboot. This appears to have fixed the immediate issue, but doesn’t answer why.
Any suggestions or comments are welcome.
Thanks in advance.
In general, you have to enable AP Fallback (Controller -> General) in order an AP to go back to Primary Controller when it is available, otherwise it will stay at Secondary controller. You can refer below notes for general AP failover
Not too sure about 7510 or Specific HA fail over scenarios
**** Pls rate all useful responses ****
Thank you very much for the reply, and the incredible link; that blog is a great storehouse of information.
I have confirmed that AP Fallback is enabled, so I'm still a little perplexed.
Even after clearing the config on an AP, and giving it the Primary WLC address thru both Option 43 and the Primary WLC field in the HA tab it still, stubbornly wants to associate to the Secondary. Something is pushing it, and I don't see why.
Some further info, assuming that the Mgmt IP of the Primary WLC is 172.16.3.133 and the Mgmt IP of the Secondary WLC is 172.22.3.133:
From a #show capwap client config:
Configured Switch 1 Addr 172.16.3.133
From a >show ap config general:
Primary Cisco Switch Name........................
Primary Cisco Switch IP Address.................. Not Configured
Secondary Cisco Switch Name......................
Secondary Cisco Switch IP Address................ Not Configured
Tertiary Cisco Switch Name.......................
Tertiary Cisco Switch IP Address................. Not Configured
From a >show advanced backup-controller:
AP primary Backup Controller .................... CISCO7500DR 172.22.3.133
AP secondary Backup Controller .................. 0.0.0.0
No where is the Secondary controller listed, except as teh backup controller, but the AP still wants to join it. The two WLC aren't even in the same mobility group any more, and the Primary is reachable.
Here is a snip from the AP's boot sequence:
*Mar 1 00:01:13.948: %CAPWAP-3-ERRORLOG: Selected MWAR 'CISCO7500DR'(index 0).
*Mar 1 00:01:13.948: %CAPWAP-3-ERRORLOG: Go join a capwap controller
*Oct 29 20:17:22.000: %CAPWAP-5-DTLSREQSEND: DTLS connection request sent peer_ip: 172.22.3.133 peer_port: 5246
*Oct 29 20:17:22.760: %CAPWAP-5-DTLSREQSUCC: DTLS connection created sucessfully peer_ip: 172.22.3.133 peer_port: 5246
*Oct 29 20:17:22.761: %CAPWAP-5-SENDJOIN: sending Join Request to 172.22.3.133
*Oct 29 20:17:22.763: %CAPWAP-3-ERRORLOG: Invalid event 10 & state 5 combination.
*Oct 29 20:17:22.763: %CAPWAP-3-ERRORLOG: CAPWAP SM handler: Failed to process message type 10 state 5.
*Oct 29 20:17:22.764: %CAPWAP-3-ERRORLOG: Failed to handle capwap control message from controller
*Oct 29 20:17:22.764: %CAPWAP-3-ERRORLOG: Failed to process encrypted capwap packet from 172.22.3.133
You only can use the HA WLC if your setup is N+1 not AP SSO. AP SSO will only have the active WLC with all the access points.
Sent from Cisco Technical Support iPhone App
Yes, I had to deploy the second WLC as N+1 because it's in a separate location, so no AP SSO.
I've never used an HA WLC in N+1 and even though Cisco and the docs say you can, I wonder about any sort of bugs.
Then it should work... TAC will of course not support it, but I have had client use the HA WLC also. As long as the mobility is up between the two and you do have AP Fallback and also the WLC name in the High Availability tab is case sensitive.
Help out other by using the rating system and marking answered questions as "Answered"
Interesting comment about TAC not supporting this; the docs show it as supported. Did they turn you away?
I set the Primary as Master Controller to see if that influences the AP behaviour at all.
The doc states that you can use the HA as a primary WLC? There is a reason for the nag screen after 90 days. I just can see them telling you that your AP's need to be on the primary WLC and not the HA as the HA is suppose to be used for failover.
Anyways, yes it will work, but I can't say what TAC will tell you, I can just assume.
Sent from Cisco Technical Support iPhone App
Configure the Fallback Feature on WLC
The last step is to configure the Fallback feature on the controller. This feature ensures that the AP switches return to the first WLC when the WLC that comes back on line. Complete these steps:
From the GUI, choose Controller > General.
A list of options appears on the General screen.
For the AP Fallback option, choose Enabled from the drop-down menu.
Note: It is sufficient to enable the Fallback feature on the secondary controller alone. But it is recommended to configure it on the primary WLC as well because it can be configured as a secondary controller for other access points.
To setup your WLC 7510s for redundancy, you have two options. Option 1.) You can configure them to be a High Availability AP Stateful Switch Over pair. With this option the WLC's are linked via their Redundancy Port, only one is Active at a time and should the Active fail the Backup takes over as the Active. The APs communicate with a shared Management InterfaceIP on the WLCs and don't notice that the fail over happened. Once this is setup any configuration changes you make are done to the Active and are automatically synced to the Standby. >>>>>>>>>>>
Option 2.) The WLC's are physically separate. Configuration changes made on one WLC are not automatically synced to the other WLC. You would configure both WLCs with matching firmware and configurations, configure 50% of your APs to use one WLC as their Primary WLC and the other 50% to do the reverse. Then, should either WLC fail the APs will join the remaining WLC and when that WLC recovers their 50% of the APs will return to it.
After playing around with different configurations, this is what finally worked in a failure scenario where the primary WLC failed:
Primary WLC: 7510 w 800 AP licenses
Secondary WLC: 7510 w HA software (technically no AP licenses)
Primary Back up WLC defined as: Secondary WLC on the Primary WLC
AP Fallback left as the default on both WLCs (default was enabled)
Identical Mobility Group was defined on each controller, and the connection between the two was established; Up.
Each individual AP (a mix of 1142s and 1600s) was defined with the Primary WLC and the Secondary WLC under their High Availability tab.
Both controllers are running 7.4.100 code.
All APs are Flexconnect Local Switch/Central Auth for both the Corp and Guest WLANs
With all of this configured, when the Primary WLC was "down" the APs found the Secondary WLC, associated to it, and clients could connect and operate as normal to both SSIDs.
When the Primary WLC came back online, the APs performed their Fallback as expected.
While I'm glad it works, what bothers me is all the extra config that is above what the Failover config section calls for. I have three other projects going where all i had to do was define the Primary Backup WLC, and the failover happened, no problems.
The only difference in this specific scenario is using the HA-7510 as the Secondary(Failover/Backup WLC). First time i've used a WLC with no real AP licenses and i just wonder if there's some quirk or bug.
I do appreciate very much appreciate everyone's feedback and comments. If someone else has an identical scenario, i'd love to hear about. I think I'll open a TAC case and see if it goes anywhere and share the results.