This discussion is locked

ASK THE EXPERTS - WIRELESS LAN CONTROLLERS

Unanswered Question
Jul 30th, 2010

Welcome to the Cisco Networking Professionals Ask the Expert  conversation. This is an opportunity to learn how to configure and troubleshoot Cisco Wireless LAN Controllers with Lee Johnson.  /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Lee is a wireless specialist for the Technical Assistance Center at Cisco. He is the co-author of "Deploying and Troubleshooting Cisco Wireless LAN Controllers." Lee has been troubleshooting wireless networks, including both autonomous and controller-based infrastructures, since 2006 in Cisco customer networks around the world. 

Remember to use the rating system to let Lee know if you have received an adequate response.

Lee might not be able to answer each question due to the volume expected  during this event. Our moderators will post many of the unanswered  questions in other discussion forums shortly after the event. This event  lasts through August 13, 2010. Visit this forum often to view responses  to your questions and the questions of other community members.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 5 (27 ratings)
Craiglebutt_2 Fri, 07/30/2010 - 14:57

HI

Heres the layout, 18 * 4404 controllers running v5.2.193 , 2 ACS 1 Appliance and 1 software both running v4.2 al latest patches.

The problem

The Appliance is the primary box and the windows is the secondary.  From what I've read and been told, users will only authenticate via the primary box.  What I have found that users are being authenticated via either ACS, sureley this isn't correct, is it?

Any ideas?

Cheers

Craig

leejohns Sun, 08/01/2010 - 05:11

Craig,

The WLCs are going to use the RADIUS servers in the following manner....

If you have multiple RADIUS servers defined, the WLC will use the first RADIUS server in the list until that server fails to respond.  It will then start using the second RADIUS server until that server does not respond and then, should there be a third RADIUS server, it would move to that one.  Should the thrid RADIUS server stop responding, it will try the second one again and as long as that server is responding, it will continue to use it.

Now, with that being said, unless you have aggressive radius failover disabled, it only takes one failed client to cause the WLC to switch over to the second RADIUS server.  When the controller switches radius servers, you should see a trap log
indicating that it was deactivated.  This would happen if the WLC did
not receive a response after 5 attempts for a single client.  If you
want to mitigate the criteria for a WLC to switch radius servers, you
can do the following:

The default in the WLC for hearing back from the radius-server  is 2
seconds.  This is set in the WLC > Security >authentication-server.
The maximum is 30 seconds - you might wish to make this larger, i.e 5 seconds.

You can also disable aggressive radius failover from the CLI:

config radius aggressive-failover disable

If this is set to 'enable', which is the default, the WLC will
go to the  next server after 5 retransmissions for 'a client'. 
Since we sometimes see 'a client' clobbering the radius-server, having
config radius aggressive-failover disable can be more forgiving because
it doesn't trigger a move or consider the current radius server dead
unless there are 3 consecutive tries for 3 different  users (i.e. the
radius-server is unresponsive for multiple users).

So my thought is that you have a misbehaving client or very busy ACS servers and that is causing the WLCs to flip-flop between them and that is why you are not seeing a consistent authentication to a single RADIUS server.

Thanks,

Lee

Scott Pickles Mon, 08/02/2010 - 09:10

I've had a lot of requests for RADIUS load balancing.  Anyone know if this will ever be a feature that is implemented? Considering that as the number of APs on a controller continues to increase, combined with the push for Enterprise Class secure access (i.e. EAP methods), it's reasonable to assume that the RADIUS server could become a bottle neck.  Trouble is, I don't know what a reasonable number of authentications per second for ACS or Windows actually is.

Regards,
Scott

leejohns Mon, 08/02/2010 - 09:28

Scott,

I have not heard of that feature being on the roadmap for WLCs. Unless you have 1000s of wireless clients all trying to authenticate at once, then I would hope that the RADIUS server could handle that without blinking. If for some reason the primary RADIUS server was slow or didn't respond, the WLC would start using the secondary RADIUS server. I don't know the specifics as I am not an ACS expert, but I know those servers can handle 100s of authentications per second if not more in the newer releases.

Thanks,

Lee

huangedmc Mon, 08/02/2010 - 09:47

A follow-up to Scott's question about RADIUS, what's Cisco's recommended best practice for AAA?

Should we have a set of ACS just for user/wireless authentication, and a separate set for internal IT authentication for network devices?

One of my colleagues raised a concern over utilizing the same ACS appliances for both purposes.

The concern is:

If malicious users start flooding auth requests via 802.1x/wireless against our ACS environment as DoS attack, we may not even be able to log into network devices to stop the attack, since WLC, switches, routers, and firewalls all authenticate against ACS.

My initial thought is that's what the client-exclusion feature is for on the WLC's.

However, we've had to disable that feature because in certain scenarios, legitmate clients may not be able to connect to wireless when that feature is enabled.

Lee, I know you may not be an ACS expert, but still wanted to toss this question out to see what you or someone else may have the answer.

leejohns Mon, 08/02/2010 - 10:00

I am not aware of any official best practice where wireless is concerned to have dedicated wireless authentications to one RADIUS server and management devices using another. I would say that if that is a concern, and makes good sense to me, then you should split up the responsibilities and have other network resources using one ACS server and wireless authentications using another. Unless you have problematic clients, i.e. outdated drivers, improper wireless profiles, then having client exclusion enabled should not be causing valid clients connectivity issues.

Thanks,

Lee

leejohns Mon, 08/02/2010 - 10:29

I also meant to mention that the WLCs also have the IDS component that along with an IPS that will prevent DoS attacks like Auth floods, Assoc floods, EAPOL floods etc, by sunning the source client. These mechanisms are independent of each other (client exclusion is L2 and IPs is used for L3-L7) and even if client exclusions is disabled on the WLAN, if an IPS tells the WLC to shun the client it will block it.

If you don't have an IPS, then using multiple RADIUS server is probably the best approach.

Thanks,

Lee

Aaron Leonard Mon, 08/02/2010 - 12:52

Scott, RADIUS load balancing is inherently a difficult proposition for any installation that is using EAP (as almost any serious WLAN deployment does.)  This is because, when doing EAP, your RADIUS transactions are no longer atomic, but are within the context of a potentially protracted EAP handshake dialog between the RADIUS server and the client supplicant.  If the WLC RADIUS client code were to decide suddenly to "load balance" from one server to another, this would completely mess up any EAP handshakes that were in progress at that time.

I.e., for RADIUS load balancing really to work tolerably on an EAP authenticator,the authenticator's RADIUS client code would have to be EAP session aware.  I think this would be quite a bit of rework.

Your best bet, for scalability and load balancing, is to have multiple WLCs, and configure them with different primary and backup RADIUS servers.

Hth,

Aaron

Scott Pickles Mon, 08/02/2010 - 13:22

Aaron -

I concur regarding the handshake and the possibility that switching servers mid-shake would have catastrophic failure.  I was indeed implying that each EAP session would begin and end on the same server.  What I didn't realize was the amount of work necessary to make that happen.  We are currently implementing load balancing in the way that you describe where we place the IP addresses in a different priority order on each WLC.

Thanks.

huangedmc Mon, 08/02/2010 - 07:08


We're currently running 6.0.196.0 on a numeours WiSM's & WLC's, and ran into two nasty bugs:

CSCtd28542 - WiSM crashed & reloaded when we were just changing the IP & hostname of the AP's
CSCte08161 - clients unable to obtain DHCP leases

Between 6.0.199.0 & 7.0.98.0, which version should we upgrade to?

6.0.199.0 has an AssureWave ribbon next to it, but I can't find any information about that version on AW website: http://www.cisco.com/en/US/netsol/ns779/networking_solutions_program_category_home.html

Also, it appears CSCtd28542 is fixed in both new versions, but CSCte08161 is only resolved in 7.0.98.0.

Could you please confirm if that's the case?

According to the Bug ID tool for CSCte08161, clients cannot get IP address from server if key-management is "wpa optional".

On the WiSM's, how do I even change this behavior?

The workaround listed in the Bug Toolkit appears to be for Autonomous AP's only.

leejohns Mon, 08/02/2010 - 07:53

Hi,

Both the 6.0.199.0 and the 7.0.98.0 have the fixes for the major issues seen in the early 6.0 code releases.  Regardless of the AW ribbon being there or not, you absolutely don't want to run any 6.0 version aside from the 6.0.199.0 release. I did check the link you sent and I do the testing results for the 6.0.199.0 release so not sure if there was a glitch when you checked it last.

As to which version you should upgrade to, that is really up to you.  If move to 7.0 and have a WCS server, then you would need to upgrade WCS as well to be compatible.  If you are running WCS 6.0.181.0 or higher, then you would not need to upgrade WCS if you went to the 6.0.199.0 code.

For CSCte08161, there is no "migration mode" on the WLCs.  On the WLCs, you would have to have seperate SSIDs, one for WEP clients and one for the WPA clients so you would not have to worry about that secario.  But, the issue is resolved in both the 6.0.199.0 and 7.0.98.0 IOS releases for the APs.

Thanks,

Lee

huangedmc Mon, 08/02/2010 - 08:41

hi Lee,

Thanks for the prompt response.

I just checked the AW site again, and 6.0.199 is now there after a refresh...my browser must have kept the old page in cache.

In your response "If you are running WCS 6.0.181.0 or higher, then you would not need to upgrade WCS if you went to the 6.0.199.0 code."

We'll probably upgrade to 6.0.199.0 because of this, and also the AssureWave status.

Although when I reviewed the release notes for WCS 6.0.181, WLC 6.0.199 isn't listed under supported versions.

The latest version it says it supports is 6.0.196. Perhaps the document just needs updated?

See:

http://www.cisco.com/en/US/customer/docs/wireless/wcs/release/notes/WCS_RN6_0_181.html#wp44440


We actually opened a TAC case (614787683) for our DHCP client issue, and the engineer said we're affected by CSCte08161, which is why I'm taking it into consideration.

Anyways, you said this bug is resolved in 6.0.199, but it's not listed in the release notes as a resolved caveat:

http://www.cisco.com/en/US/docs/wireless/controller/release/notes/crn6_0_199.html#wp614071

Could you please confirm if this is indeed resolved in 6.0.199?

I apologize if I sound like I doubt what you're saying.

Every time we upgrade code for over 1000 AP's though we have to schedule outage windows for nearly 100 different locations (running H-REAP), so I wanted to make sure the version we decide to go for will be stable for a good while.

leejohns Mon, 08/02/2010 - 09:08

No problem at all.  Since I don't know the specifics of that one TAC case, I really can not comment on why they said you were running into that bug, but the integrated into section of the bug states that is was resolved in the 12.4(21a)JHA version of code and the IOS image an AP runs when joined to a 6.0.199.0 WLC is 12.4(21a)JHB.  The reason it does not show as resolved in the 6.0.199.0 release notes is because there is no actual 6.0 code listed for the bug, just the autonomous AP codes ans the 7.0 release.  Again, I don't see how you could have been affected by this bug as there is no "wpa optional" setting on WLCs, so this particular bug should not be of concern to you with the controllers.

For WCS, the latest 6.0 release is the recommended version, but the 6.0.181.0 version does work. There is a documentation bug, CSCth85796 -CSCth85796 - WCS 6.0.181.0 Release Notes Need to Cite Support for WLC Ver 6.0.199.0WCS  6.0.181.0 Release Notes Need to Cite Support for WLC Ver 6.0.199.0, filed to get that added to the appropriate release notes.

carock@epconline.com Mon, 08/02/2010 - 08:40

I have 2 Cisco 4400 Lan controllers in two physical buildings.

Both running software: 6.0.182.0

These are my first experience with them and they seem to be working fine since installation several months ago.

I am getting these messages constantly each day from both controllers. Are these nuisance messages (normal) or are they something I need to look into?

-----

WCS 10.1.6.28 has detected one or more alarms of category AP and severity Critical
for the following items:


802.11b/g interface of AP AP-1 is down: Controller 10.2.254.253 Reason: Unknown. - Controller Name: EPC-Controller-2


E-mail will be suppressed up to 30 minutes for these alarms.

------

Another from the other controller

------

WCS 10.1.6.28 has detected one or more alarms of category AP and severity Critical
for the following items:


802.11b/g interface of AP AP-8 is down: Controller 10.1.6.26 Reason: Unknown. - Controller Name: EPC-Controller-1 (2 times)
-------

Not always the same AP either.

I haven't had time to submit a TAC case. Still thinking that may be overkill if this is just normal since I don't have any complaints from the users about usability.

Thanks,

Chuck

leejohns Mon, 08/02/2010 - 09:12

Chuck,

The very first thing you will want to do is upgrade the WLCs to the 6.0.199.0 version of code.  There is a software advisory (http://www.cisco.com/web/software/Wireless/Deferral/Software_Advisory_6_0_196_0.html /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} )   out for all the previous 6.0 releases.  There are numerous fixes for several different issues.  Once you have upgraded, see if you continue to see these error messages.  As mentioned earlier, if you do happen to have a WCS server, you will want to make sure it is running 6.0.181.0 at least in order to support the new WLC code.

Thanks,

Lee

Scott Pickles Mon, 08/02/2010 - 11:17

Lee -

While GUIs have made life a lot simpler for many of us, I find that when things become too automated I get lazy.  I like to know the nuts and bolts of how to get information out of my wireless networks from the CLI as much as I can.  It helps me to stay sharp, and really comes in handy when I'm troubleshooting remotely or can't access a GUI for some reason.  With that in mind, one bit of information I'd like to get from an AP is its percent utilization.  WCS has a report that I can simply run to get that, but I don't see how I can get that information from the CLI.  I suspect it's the comparison of a few data bytes that come from a couple of commands.  Any chance you know how I can get an AP to give up its percent utilization from the CLI?

Regards,
Scott

leejohns Mon, 08/02/2010 - 12:00

Hey Scott,

What AP utilizations in particular are you interested in, channel, CPU, memory, etc?

Thanks,

Lee

Scott Pickles Mon, 08/02/2010 - 12:32

Lee -


Those would be the big three to start with.  I would assume that whatever information and/or tasks I learn from obtaining the output for those would allow me to understand how to get the others.

Regards,
Scott

leejohns Mon, 08/02/2010 - 13:02

Scott,

There are a few different ways to get this information from the AP, but I will say if you want to see trends, WCS reports is the way to go. If you are familiar with the aIOS commands like 'sh process cpu', then these are the same commands you would run for the individual APs. Although they running a lightweight image, they still have an IOS running on them

So you can run commands like:

Show process cpu

Show process memory

Show controller d0 (to see the 2.4GHz radio information)

Depending on your WLC code, you can enable telnet or SSH on the APs and then right them or you can enable remote debugs on the AP from the WLC.

From the WLC:

debug ap enable

If you enabled telnet/SSH on the AP from the Advanced tab, then you could telnet directly to the AP, log in, and run 'show process cpu', etc.

Thanks,

Lee

leejohns Mon, 08/02/2010 - 13:07

Seems like part of the last post was cut off as the actual ap command from the WLC CLI is missing so here is again:

So once you have enabled remote debugs then:

Debug ap command "

Lee

leejohns Mon, 08/02/2010 - 13:15

OK, no idea what is going there. So hopefully 3rd time is the charm.

So once you have enabled remote debugs then:

Debug ap command ""

So if you wanted to see the cpu information:

Debug ap command "show process cpu"

Lee

Scott Pickles Mon, 08/02/2010 - 13:24

Lee -

Got it.  I knew of those methods already, so no worries there about your post getting cut off.  I just didn't know what command WCS targets when running those reports to get the percentages.  I just want to make sure I'm doing the same thing so that I get meaningful output.

Regards,

Scott

leejohns Mon, 08/02/2010 - 13:37

OK, good. That was frustrating me Yeah, to my knowledge there are no hidden targets that WCS would poll to gather that information.

Lee

huangedmc Tue, 08/03/2010 - 06:36

hi Lee,

I'm still scrubbing through the bugs for 6.0.199.0, and came across this one.

Could you please tell me how it affefcts H-REAP clients, and under what circumstances?

Looked it up via Bug Toolkit, and was told it contains proprietary info so no info is given.

CSCth12916

A wireless client is not getting an IP address when it is associated to the H-REAP access point.

CSCth12916            Bug Details

Dear valued Cisco Bug Toolkit customer, the bug  ID CSCth12916 you searched contains proprietary information that cannot  be disclosed at this time; therefore, we are unable to display the bug  details. Please note it is our policy to make all externally-facing bugs  available in Bug Toolkit to best assist our customers.  As a result,  the system administrators have been automatically alerted to the  problem.

While we are working to resolve this issue, we invite you to reach out to the experts on the Bug Toolkit Support Community.  You may find answers there to your Bug Toolkit questions, or post your feedback on our forum as well.  Thank you.

Note: Some product enhancement requests and documentation bugs may not be available in Bug Toolkit.

===

This one's categorized as 1 - catastrophic

Once again, not much info is available via Bug Toolkit. How can this version of code be AW certified w/ a Sev1 bug?

The conditions say: Normal use.

What does that mean? The WLC would just crash w/ no reason?

CSCth43447            Bug Details

WiSM crash Task Name:spamReceiveTask
Symptom:
WLC crash Task Name:spamReceiveTask
Conditions:
Normal use.
Workaround:
None

===

This one's listed as an open caveat in the release notes, but when looked up in Bug Toolkit, it says it's fixed in 6.0.199.

CSCtd62937

The show ap summary command does not show the access point names.

Fixed-In Fixed-in
5.2(194.0)
7.0(98.0)
7.0(94.143)
6.0(189.4)
6.0(199.0)

I can care less about the AP names, but this brings up a question:

How up to date is the release note for 6.0.199.0, or in general?

I'm sorry if I don't sound friendly in these questions, but it's frustrating not being able to find a code to run stable wireless network.

What can I do as a customer to help make things better?

Should I ask our account team to voice our concerns?

Would the BU actually care or listen?

leejohns Tue, 08/03/2010 - 06:58

Hi,

There is not a lot of information on this bug, but the issue was seen when the clients were switching between the different WLANs being serviced by the H-REAP AP. There is no more information on it and the issue was reproducible using an Intel based client and was almost never seen when testing with the Cisco ADU client. So this might be a client side issue, but again, there is not enough information on this bug to say one way or the other. Unless you expect clients to be flip-flopping between SSIDs I would not be overly concerned.

In fact, there are no customer cases linked to the bug indicating that it has not been seen in the wild.

Thanks,

Lee

huangedmc Tue, 08/03/2010 - 07:01

thanks.

what about CSCth43447?

leejohns Tue, 08/03/2010 - 07:11

For CSCth43447, again, there are no customer linked cases so it has never been seen outside Cisco's internal testing. From what I can tell, the issue was fixed in a pre-6.0.199.0 release, but it looks like some of the bug attributes have not been updated so the Release Notes would reflect it as not resolved. Since it was a Severity 1 issue, it should have been included as all Sev1 are supposed to resolved before a new release can come out. I just need to verify this with some folks but they will not in until later today since they are on the West Coast. I will let you know and will get a documentation bug filed against the 199 release notes if that is necessary.

Thanks,

Lee

leejohns Tue, 08/03/2010 - 13:17

Sorry,

I missed the bottom of this post about the release and I did not to address that concern.  We are working on a better procedure for having bugs marked correctly so that when it comes times to write the release notes, they are as accurate as they can be concerning open bugs, etc.  A lot of this dependson the developer to correctly update the flags on the bug so that when the document writers parse the bugs themselves, they can tell what is resolved and what is not.

So in the case of CSCtd62937, it is entirely possible that when the release were being compiled, the flags for this bug were not updated yet to reflect the fix in the 6.0.199.0 code and therefore are shown as not resolved when it really is.  So a doc bug needs to be filed to correct that issue.

As with any issues, I would recommend that you do talk to your local Cisco Account teams when you meet with to voice your concerns so that they can pass them along to the Business Units.

Thanks,

Lee

peter121987 Tue, 08/03/2010 - 07:49

Hi, a  working in the desigh of a WLAN in a building, it`s critical to know about the concurrence of the different cisco access point, but i can`t find this aspect in the datasheet of the products, please somebody help me!!!. Thanks


leejohns Tue, 08/03/2010 - 07:57

Pedro,

I am not sure I understand what you are referring to when you say "concurrence" in regard to the APs. Can you elaborate on that?

Thanks,

Lee

peter121987 Tue, 08/03/2010 - 10:50

With concurrence I`m talking aboute the number of users that can be connected simultaneously by the AP, thaks for the answer

Pedro

leejohns Tue, 08/03/2010 - 11:05

Pedro,

That is a loaded question, and the answer is "It depends". Although you can have up to 254 clients associated with a single AP (128 per radio if AP has both b/g and a radio), because wireless is half-duplex, a single 802.11b client can only realize about 5.6 Mbps throughput at best. For 802.11g or 802.11a only, a client can realize only around 28 Mbps. If you are mixing 802.11b and 802.11g clients, the 802.11g

clients are forced to use protection mechanisms that essentially cut the throughput in half. You also need to consider the type of client traffic you are expecting. The situation is much different for clients just surfing the web as opposed to 200 clients trying to

download MP3 files at the same time. For best results, I would say you would plan for no more than 25-30 clients on a single AP radio.

Thanks,

Lee

peter121987 Tue, 08/03/2010 - 12:15

Thaks Lee for your answer, but this information (concurrence) it doesn`t come in the datasheet of the access point.

kmiller1634 Tue, 08/03/2010 - 12:40

Pedro,

         A big reason that Cisco doesn't define a maximum concurrent connections in their datasheets is primarily because there are way to many variables that come into play when trying to deteremine that value, things like interference, protocol, available data rates, client devices, modulation, traffic type, and duty cycle. All these factors effect it. What Lee provided was a good best-effort practical value.


Thanks.

leejohns Tue, 08/03/2010 - 12:52

Kayle,

You are completely correct. In addition to the hardware limitations on the number of clients that can associate to a single AP radio, the other factors you mentioned will come into play as well.  So the updated data sheets would be listing those numbers based on hardware alone.  I would also hope that there would be a note explainging that with the very reasons you mention.

Thanks!

Lee

leejohns Tue, 08/03/2010 - 12:42

Pedro,

Yes, that is correct. The data sheets are the responsibility of the product managers to document the information like that. I can tell you that TAC is working on having this information verified and getting it up on the data sheets, but this is not going to happen overnight.

The maximum number of clients is dependent upon several factors:

- per radio/AP limits (i.e. there may be a limit per radio, and an overall limit per AP)

- AP hardware (the 16MB APs have a lower limit than the 32MB+ APs)

- lightweight vs aIOS

Even if the hardware limit was 500 clients, in reality, this is never really going to work as the clients are not going to be able to pass any traffic since wireless is a shared, half-duplex medium (at least with the current technology).

So hopefully in the near future, this information will be on the datasheets so that there is no longer any confusion to this.

Thanks,

Lee

cthrasher123 Tue, 08/03/2010 - 08:26

Hi Lee,

I have an issue which I would like some advice on. I have a test setup configured at my desk with 2 AP's, and I am doing HREAP(i know you all know me and my HREAP at Wireless TAC). Anyway, so I am sending 2 SSID's to these test ap's. All is working great except sometimes I cannot see the WLAN, and I rememdy the situation by going into my wireless client and deleting the wlan and re-creating it and then bam, things work fine.

I don't want to broadcast the SSID so I setup my adapter to connect to the non-broadcasting SSID. But like today I came into my office and tried to see the wlan and it was not there. So I blew it away from my adapter and re-created and it automatically connected me. Why do you think I'd have to do this (occasionally?)

ps I am using WIsm, controller code 6.0.182 managed by  WCS code 6.0.132.

Thanks-

Cat

leejohns Tue, 08/03/2010 - 08:56

Cat,

There are several H-REAP related issues with the early 6.0 code. You will want to run 6.0.199.0 and see if those issues remain.

Thanks,

Lee

mbroberson1 Wed, 08/04/2010 - 07:35

Hi Lee,

We have 3 5508 controllers running a mix of 1230's (upgraded to 1231's) and 1242's APs A/B/G with 5959 ant. Currently running B & G only. Running code 7.0.98 all 3 controllers, the controllers are each port-channeled to a Nexus 7010 running the latest code.

Issue - We seem to hear of complaints from end users experiencing issues (at any location) where they disassociate and have to re-associate. We have two (2) remote sites (out of approx 15 sites) where users cannot stay connected to the WLAN these two specific sites seems to be having more issues than any others, could this maybe a hybrid-reap issue?

Any thoughts on any of these issues where to look?

Thanks,

Brandon

leejohns Wed, 08/04/2010 - 08:19

Brandon,

That is a possibility. Are the clients actually getting disconnected or are associating/authenticating fine but not getting IPs, etc? It is hard say to exactly what might be going on without some more information. It could be an RF issue and these sites are experiencing some sort of interference that is causing them problems. Since they are b/g only definitely a possibility. I would check the client driver versions as it is almost always a good idea for them to be running the latest available. Especially since the WLCs are running the latest code. You could also run 'debug client xx:xx:xx:xx:xx:xx' from the WLC CLI logging to a file (no session timeout on the WLC) against a particularly problematic client and have the end user make note of the date and time they were disconnected and then check the debug. You can also syslog the WLC logs if you not already and look for any RF related traps around that time as well like channel or power changes. If you see those around the same time the clients have issues, then that would suggest interference.

Thanks,

Lee

rramlal@fj-icl.com Fri, 08/06/2010 - 09:17

Hi

I have upgarded to that same version and right i am getting some disconnects being reported. The wlcs are installed at two sites, and the strange thing is that only one site is reporting the issue which points in the direction of interference. The 7 version comes with a clean air technology, if this is enables will it prevent interference? So the issue of disconnects might be related to something else.

ALso one of the users reported that he is expereing full disconnects as in the wireless connection goes dead with the red x seen and then reconnects and authenticate.

I have been troublshooted disconnect with this customer for a while as the previous version was 6.0.196 which had the bug of shutting down the beacon after a couple of seconds, now that i have upgraded it seems like the issue is back back but this time the disconnects are different, previously the connection would remain with the ap but now the ap is dropping connections.

ANy ideas what to do, i have asked the customer to run the debugs and send me the traps from the wlc, is there anything else to try?

leejohns Fri, 08/06/2010 - 09:43

The clean air feature in the 7.0 release will not prevent interference. It is more like having a spectrum analyzer within the WLC to allow you to see the state of the RF environment. You also need the 3500 series APs to be able to use it.

For the issue at hand, along with the debugs and trap logs:

1. Check the APs this location and see if interference profiles are failing (MONITOR>Access Points>802.11b/g/n | 802.11a/n.

2. Are rogue APs/client being detected by the APs at this site. If so and RLDP is enabled on the WLC for all APs, then these AP might be going offline to try and attach to the rogue to see if it is on the wired network.

3. Use a spectrum analyzer like Cisco Spectrum Expert to see what else is out there in the RF. It might be that when this happens, someone is using a poorly shielded microwave and wrecking the RF.

4. Check the driver versions for the clients and make sure that they are up to date.

Thanks,

Lee

rramlal@fj-icl.com Mon, 08/09/2010 - 06:09

Hi

I have attached the wlc configs, i am uncertain if the wireless connection dropped at that time. However from the attaced configs i am seeing the rogue ap and shortly after all clients are deauthenticated. These clients have the latest drivers according to th release notes.

I have not verified the two things you mentioned however from looking at the configs can you say if it is infact the rogue causing it and by disable the RLDP will resolve this?

I gonna do some further troubleshooting this afternoon and will update further.

Besides the two things you asked about i will do a debug on the client to see the series of events that takes place, is there anything else you think i should verify?

leejohns Mon, 08/09/2010 - 06:22

I cannot tell from this information if RLDP is cuasing your issue or not. The reason code is 1, unspecified, so can't tell anything from that.  You could run rldp debugs (debug dot11 rogue enable, debug dot11 rldp enable) to see if the APs are going off line to try and attach to the rogue AP. A quick test is to just disable RLDP temporarily and see if the issue goes away.

Thanks,

Lee

shh5455 Wed, 08/04/2010 - 09:30

Ever since the early days of LWAPP I have never been able to keep APs registered with a certain controller.  With current versions I set the primary/secondary/tertiary WLC name and IP address for all APs.  Then I leave the customer.  Invariably I come back a few months later and the APs are all over the place.  I can't find anyone at Cisco that recognizes this behavior (possibly not in the field enough), but my coworkers see it all the time.  We say the APs are like "herding cats."  Trying to make a good design and avoid intercontroller roaming, but it's impossible when the APs don't behave like the documentation says.  There's no rhyme or reason to when they leave.  Even with AP fallback on, they still don't "fallback".  Any thoughts on that?

Thanks.

leejohns Wed, 08/04/2010 - 09:51

The only reason for APs to drop off a particular controller that is up and running fine is if the AP misses packets. This could be because heartbeats are lost, configuration packets are lost, the AP crashed, a bug such as the early CAPWAP issue where the WLC had problems reassembling out of order packets, etc. The root cause of why the packets are missed could be anything like network congestion (packets need to arrive within a certain time frame), fragmentation, improper physical deployments like LAG on the WLC, but no port-channel on the switch, improper etherchannel load-balancing (should be src-dst-ip) on the switch with LAG, multiple distribution ports connected (no LAG), but only a single AP manager. All of these things can disrupt proper traffic flow into and out of the WLCs and cause AP instability.

The interesting thing is that it takes 3 missed heartbeats between an AP and controller before it will drop, but should a single configuration packet (think RRM updates, etc) not be ACK'd, the AP will consider the controller gone and try to rejoin it and if it can't will move on to the secondary, tertiary, etc.

So if, for example, the WLC is configured for LAG with multiple gig connections and the etherchannel load-balancing on the switch is set to src-mac, and a config packet from the AP is fragmented, it is possible for the different fragments to be sent down different ports in the etherchannel to the WLC. If this happens, the WLC would not be able to reassemble the packet correctly because it only supports src-dst-ip load-balancing. So the WLC will not ACK the packet (in its eyes it never got the packet), the AP drops.

If you have APs that are not stable, then there is some underlying cause causing them to miss packets and that is why they are dropping. Sometimes the only way to figure out exactly what is happening is to get concurrent wired packet captures from the WLC and AP switch ports. I had a case once for that very thing and once we had the captures, you can see that intermittently, the network was losing part of a fragmented packet from the AP and causing the disassociation.

Thanks,

Lee

Scott Pickles Wed, 08/04/2010 - 10:08

Lee -

How do you capture packets when the controller is in LAG mode?  I've never tried to span a port-channel, but I don't see why you couldn't.

Regards,
Scott

leejohns Wed, 08/04/2010 - 10:14

Scott,

Yes. That is exactly what you do. Set the source interface for the SPAN to the port channel.

Lee

shh5455 Wed, 08/04/2010 - 10:34

Thanks for the insight. I will keep that in mind.

leejohns Wed, 08/04/2010 - 10:48

No problem. Glad to help out.

Lee

Actions

Login or Register to take actions

This Discussion

Posted July 30, 2010 at 1:54 PM
Stats:
Replies:140 Avg. Rating:4.95455
Views:44359 Votes:0
Shares:0

Related Content

Discussions Leaderboard