Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 
New Member

V2V of virtualized ACS appliances partially broke wireless authentication

I thought it'd be a good idea to share this with the community in case somebody else runs into this problem.

We have two virtualized ACS 5.3 appliances setup in PRIMARY and SECONDARY roles.  Late last week the server team V2V'd these systems to move them to our UCS environment.  On Sunday I was unable to connect to wireless.  I was not authenticating and was unable to get an IP address.  One of our network admins, in troubleshooting, rebooted one of the wireless LAN controllers and then I was able to connect via wireless. 

Starting on Monday we started receiving sporadic reports that people were having issues connecting to wireless.  We were not able to establish a pattern.  It wasn't specific to a WLAN controller, a site or even a SSID within a site.  I got my hands on an affected laptop and started snooping around in the event logs and discovered this error message in the System log (on a Windows 7):

Source: Schannel

Event ID: 36887

Level: Error

User: SYSTEM

OpCode: Info

I looked in the WLC logs and found this over and over again:

*dot1xMsgTask: Oct 17 10:39:42.506: %DOT1X-3-MAX_EAP_RETRIES: 1x_auth_pae.c:3136 Max EAP identity request retries (3) exceeded for client <mac>

Then I started snooping around in one of our ACS servers and found a bunch of "PEAP handshake failed" messages.

At this point we started putting things together and suspected that this was related to the V2V.  From the CLI of both the primary and the secondary I issued this command:

FNB-ACS-Pri/admin#show application status acs

and each system showed:

ACS role: PRIMARY

In the web UI, under System Administration -> Operations -> Distributed System Management we could see that the ACS system which is normally SECONDARY was showing and online status of 'x'.  Thinking about what might have changed in the V2V and thinking about how ACS might implement basic security between the primary and secondary servers, I clicked on the link for the secondary instance (from the primary's perspective) and wrote down the MAC address.  Then I issued:

show inventory

from the CLI of the secondary and found that they were different.  V2V changed the MAC address of the secondary.  I suspect that this bothered the PRIMARY ACS server.

We shutdown the secondary and changed the MAC address to what the primary thought it was supposed to be and powered it back up.  Within about 10 minutes everything looked normal.  The primary reported a PRIMARY role and the secondary reported a SECONDARY role.  The 'Online Status' for the secondary in the web UI changed from an red 'x' to a green check mark.  We stopped seeing the PEAP errors and verified that laptops affected by this issue were not able to authenticate and acquire an IP address.

UPDATE: We did a WebEx with TAC just to make sure things were OK but they were not.  Even though things looked good in the UI and users were connecting, there were issues with replication (two replication instances)  The TAC engineer deregistered the secondary ACS and reregistered (may have done some other things in between that - my colleague worked with TAC). 

The correct way to have approached this would have been to treat this as a hardware replacement and follow the steps in the 5.3 admin guide.  It would have been better to deregister the secondary from the primary and delete, then v2v and then add back.

314
Views
0
Helpful
0
Replies
CreatePlease to create content