Currently running 4 ACS 5.1 servers to load balance our 802.1x requests, but am getting an error message which is a bit unhelpful in terms of details.
The error message is
11051 RADIUS packet contains invalid state attribute
and only seems to occur on one port port on one particular switch....not quite sure what is causing this for this port as all the other ports on the same switch are not having any issues.
The troubleshooting says
The state attribute in the RADIUS packet did not match any active session.
Do the the following: Check the network device or AAA Client for hardware problems or known RADIUS compatibility issues ; Check the network that connects the device to ACS for hardware problems.
The network device is set up the sames as at least 50 other switches in ACS.
The switch port is nominally designated as being for a client PC.
Anyone have any ideas what is causing this error.
I would need to know what switch this is and what version. the concept for the message would be
When initiating an Access-Request with ACS, the NAD should not inculude
any state attributes in its Access-Request. State attribute it something
that AAA server creates and updates in its response.
There are a few bugs on different devices that may cover this.
I Tried to turn ON PEAP Session Resume :
Select System Administration > Configuration > Global System Options > PEAP Settings.
Will monitor if this solved my issue
In my case the AAA client is a Wireless Lan Controller (Cisco).
I ve applied the patch 4 of 184.108.40.206 and activate PAEP session resume..but I still have the error...
anyone solved the problem?
I got the same Message with Cisco Phones and Switches. The Phone use EAP-MD5 for authentication.
I using ACS 5.2
I was thinking, maybe somebody found the solution?
In my case it’s happening only on WLC with version 220.127.116.11 (no problem on version 18.104.22.168).
Logging from ACS22.214.171.124.4 :
Failure Reason > Authentication Failure Code Lookup
Generated on:December 20, 2010 10:01:55 AM CET
Logging from WLC(126.96.36.199)
Mon Dec 20 10:03:58 2010
RADIUS server 188.8.131.52:1812 activated on WLAN 5
Mon Dec 20 10:03:58 2010
RADIUS server 184.108.40.206:1812 deactivated on WLAN 5
Mon Dec 20 10:03:58 2010
RADIUS server 220.127.116.11:1812 failed to respond to request (ID 241) for client 00:1f:3c:d0:98:6f / user 'unknown'
Mon Dec 20 10:03:22 2010
RADIUS server 18.104.22.168:1812 failed to respond to request (ID 223) for client 00:1f:3c:d0:98:6f / user 'unknown'
Is it just cosmetics accounting bug? Because nobody of our clients complain.
We had the same problem on wireless & the same exact behavior you are seeing... we have opened multiple TAC cases trying to get to the bottom of this...
There was a bug in WLC code prior to 22.214.171.124. Once we updated all WLC's to the same version, the error went away... temporarily......
All fo our WLC's are on v126.96.36.199 now... We needed 7.0.98 b/c we have some 3502 AP's... To date, this is the only version of 7 code for the WLC's.. i'm beginning to wonder if the bug wasn't fixed in v7.0.98 but was in 188.8.131.52 - 184.108.40.206 but not in v7???
We have a mix of 2106-06, 4402-12 - 50's & 5508-12 - 50's. Interesting enough.. our WISM blades are not throwing this error... I can't help but notice the 5508 WLC's send this message way less often... it appears to be the 2100 & 4400's that just hammer the ACS with this error... Also, we had the error several weeks ago, turned off Radius account & rebooted the primary instance & the problem went away.. until today.. Nothing has changed.. so something happens trigger all of the WLC's to cause this message over & over...
As i briefly mentioned, we had Radius accounting turned on... TAC said the WLC was sending a packet the ACS didn't receive fully or know what to do with.. hence the error... I have since confirmed that all sites have Radius accounting turned off.. In reality, Radius accounting for wireless devices provides almost no useful information... Accounting should be reserved for device administration so you can track commands...
I'm subsribing to the thread so if anyone finds a fix... please post a response.. I have opened a new TAC case this morning.. once we figure it out, I will post the answer..
Well... i'm learing this 11051 - invalid state attribute has more than one root cause...
in our particular scenario... We have a primary & secondary ACS defined in our WLCs... For some reason, one of the ACS instances stopped authenticating users... The ACS wasn't down, just stopped authenticating users... Was logging plenty of the 11051 messages but the closer I got to looking at the passed authentications, none were coming from that instance
So the WLC start the EAP conversation with the "failed" ACS . since the ACS didn't respond with the next part of the EAP conversation, the WLC thought that ACS down, (we would see "Radius server x.x.x.x deactived for WLAN x" in the WLC logs)... The WLC would try to continue that same conversation w/ the second ACS but had the old failed ACS info in the packet... hence why the packet was not valid... causing the error...
Bottow line, we had to restart the "failed" instance of ACS (stopping & starting the adclient service didn't resolve the issue)... after the reboot..both instances of ACS started authenticating users & the problem went away...
So to date... for us...
Thanks for the update. Using your post as a checklist;
Are you saying that a reboot of both ACS servers will bring back the service? I have been told that upgrading to the latest patch (5) will resolve this issue.
I'll update with how I get on with my issue.
Ok, so this is now resolved for me. For some reason the AD accounts for both of the servers were disabled. These are now enabled and the users are authenticating ok.
The fact that the domain was reachable but showing as disconnected should have been the give away I suppose.
I'm still seeing these 11051 errors on one of my secondaries ... seems similar to what grnetcomss was seeing. I corrected the ntp and timezone issues, but one ACS appliance does not appear to be doing any successful authentications and only logging the 11051 errors in the RADIUS authentication log. Distributed System Management says its OK and replication status is "UPDATED". I tried reloading it, but made no difference. Has anyone else managed to get to the bottom of this??
Finally think I've sorted it: The secondary in question had become disconnected from AD. I had to unregister it from the primary, clear the AD config, reconnect it to AD, then factory default it (it maintains it's IP config so it's OK), re-enter the license, then re-associate it with the primary. Now it's all green :-).
I found that my clock had reset to UTC time and therefore was not communication with Active DIrectory
. Once I reset the clock and NTP server, everything worked again.
I also had this problem ... Found that a couple of the secondaries had wrong timezone and no NTP set. From the command line:
clock timezone Australia/Brisbane
You then need to restart ACS.
It's all good now ... Thanks for pointing me in the right direction :-)
Old thread but I'm going to go ahead and provide an update on what solved this issue for me in hopes that it will help someone out. I had a case on this issue with TAC for 4 days with no resolution. It turns out that the "server" service on one of my Windows Domain Controllers (2008 R2) was stopped. This server held the FSMO role of "PDC Emulator" so therefore authentications coming from my ACS server to AD failed. This was a difficult issue to find but none the less that is what solved it for me.