Hello everyone, I've been trying to track down a very stubborn problem that I've been having at 2
sites now; the client computers show repeated syslog entries of
Event 4201 - The system detected that the [XXX adapter name] was connected to the network, and has initiated normal operation over the network adapter. Each entry coincides with a brief network interruption lasting a few seconds. These interruptions and event log entries happen at seemingly random times, and do not coincide with each on various machines. They might happen once every minute or so for a few minutes, then nothing for an hour or more.
The AP in question is an AIR-AP1242G A-K9 V03 running F/W ver. 12.4(21a)JA1. (same at both sites) We are running WPA2 PSK for security, with no further measures (ie. MAC address filtering).
Client adapter cards at site one are a mixture of Cisco SMB WMP200 and Linksys WMP54G, both on latest drivers. Clients are running XP Pro SP3 on a couple generations of Advantech industrial PCs.
Clients at site two are all Dell optiplex 380 running XP Pro SP3 and all are using Linksys WMP54G ver. 4.1 adapters on Linksys' latest drivers.
I have seen this issue at both sites, but is most pronounced and best documented on the systems at site two.
I have a Metageek Wi-Spy that I've used to monitor for any crazy RF noise (nothing notable found) at both sites.
I would love to get this issue resolved as it has been an ongoing problem and a very difficult one to diagnose. Any assistance would be greatly appreciated.
Thanks in advance
Justin, does your wifi tool have the ability to see packets as well? You may want to see if there is a loss of the beacons. There shoudl be 10 beacons a second per SSID, default value.
Hey Stephen, I just looked into it and no it does not! I'm trying to get my hands on a network adapter that will grab the 802.11 management packets. Will update ASAP.
Ok, it took some doing but I did get my hands on an adapter (& drivers) that will forward those management packets and I captured a minute or so worth of traffic at this site from a couple locations. We are using the same WAP and experiencing similar symptoms at both sites so I will assume that whatever is responsible for these events here will also apply at the Calgary site.
Now, I only counted for about 3 or 4 seconds worth (checked right beside the AP and at the farthest workstation) and confirmed that there are 10 beacons per second being broadcast. How much data should we be reviewing here? Do we need to be capturing packets and waiting for an error occur to see if there is a lapse in beacons sent at the time of the event?
Please let me know how you have in mind to proceed.
Interesting ... I made a video on the 4201 and 4202 codes not long ago. Glad to see others, other then myself using it ...
You need a solid packet capture from a problem client and see who is telling who what. This is how you will get your hands around this issue.
What aps and clients are you using? Are the clients the same and on the same driver?
BTW check your client power save mode .. Make sure its on CAM
As far as I am concerned ALL of my clients are problem clients! The frequency varies between them (esp. the ones running the Cisco WMP200 adapters, less frequent), but eventually all of them will report 4201 entries in their syslogs (not counting the event at startup of course). As I mentioned in the first post the machines are all running XP Pro (most are SP3) and the majority are running Linksys WMP54g adapters on Linksys/Cisco's latest driver. Applications are primarily GUI sessions connected to an offsite mainframe, followed by viewing PDF documents. Those mainframe clients don't like being disconnected, results in locked up DB records etc.
As for the power save mode, good suggestion but I've already checked that. The adapters are set to always on (do not sleep).
I'm not an expert in the wireless world by any means, but from what I've learned over the last couple of days capturing 802.11 packets on windows machines is not an easy task! In order to capture the samples so far I've had to load Linux on my laptop, source an adapter that is compatible with one of the known good drivers used with aircrack-ng, and capture using wireshark. The reading I've done so far tells me that it's nearly pointless to attempt it in Windows, which is what's running on the problem clients.... That being said I suppose I could repeat the above process on one of the trouble machines and put it back into the wild, but that wouldn't do much good as I would have to use a different wireless adapter with an appropriate chipset which kind of defeats the purpose..... Though I suppose it would tell me that there's no problem with that adapter running under Linux ;-)
Please explain a little more what you mean by "You need a solid packet capture from a problem client and see who is telling who what". Particularly the part about a solid packet capture? At this time I don't have a reason to doubt the capture rig I've got set up and I could certainly leave it running longer if that's what you mean?
Thanks a lot for the suggestions
First, 4201s are not a bad thing. These will register in the event log when a roam occurs or a reassociation to the same access point by a station.
What you are saying is that when you see a 4201 events your cleint is doing what. Is it locking up, is the app just lossing its session ... what ?
What access points are you using ? Sorry i see that you are using 1242s... Are these LWAPP or IOS?
In regards to the packet capture. Wireshark is perfect. Can you do a capture of a problem client while it is working and then when it breaks. This will be very helpful to troubleshoot.This can tell us where the commuication breal down occurs.
Also when you did your capture, did you see fames call NULL?
BTW .. I cant seem to find the release notes on that REV of code for your AP. I did find that its almost a year old. Are you opposed to upgrading to the latest release
Agreed, event 4201's are not a bad thing as they are indicating a proper connection of a network adapter (Microsoft even has an article about it!). However in my case it's the frequency of the event, and more importantly the the coincidence with a complete stop of network traffic that has my attention. As I write that it occurs to me that I can't say ALL network traffic as I don't have any type of packet capture running on the problem machines at the moment... For eg, if I am connected to one of the machines using RDP my session will stop responding during the period and the machine will stop sending and answering pings. Once the session comes back I will find a new 4201 entry in the event log at the time of the blackout. I will set up Wireshark for Windows on one of the machines and leave it running and see what happens to the data packets during that blackout period....
Last time I checked that was the latest FW version for the WAP, but I will surely take a look if there is a newer one out now as you mentioned. I've updated my original posting with more info on the HW version (I think!) of the WAPs, so hopefully that was the info you were looking for.
We are running IOS on both APs, they are both standalone devices.
I did not find any Null frames in my two captures at this site, but there is probably only a total of 3 min. worth of data there. I will put the laptop back out on the floor and leave it recording for a longer period and check again.
As I mentioned above, I will install Wireshark on one of the stations and leave it recording for some time, with any luck it will have a problem in a short time and report the findings. Of course I will only see TCP traffic (no 802.11 control packets) as it will be running in Windows.
Its normal for clients to lose a ping or 2 during a roam (4201) event. Are you losing more then 2 pings? How many 4201s will a typical client record standing still for a 10 minute period?
Yes, there is a newer version of FW for the APs.
What is the TX power on your access points?
You will need to run wireshark and collect 802.11 frames from another station not the station having the issue.
How many clients do you have per ap ? Ball park ...
Hey George, yes sometimes more than a couple pings are lost. Also, there should be no roaming as the clients are only configured with access to one AP though a few of them are in range of another in the office. Now for the number of events in a 10 min. period that really depends on the machine as some are worse than others. Worst offender might be two or three in a matter of a minute, then OK for a few minutes, then again. So worst I'd say would be easily 10+, best would be 0 within 10 minutes (but will occur within longer time period, maybe half an hour to a few hours).
Power for both APs is provided by POE via Netgear switches, FS726TP in Calgary and FS752TPS at this site. Tx power setting on the AP itself is set to 100%.
No of clients on the Calgary AP is 9, here is 6.
I am currently running my laptop on the floor at this site and will have a good couple hours worth of data which should be more than enough to cover the span of some of the machines reporting events, though I will double check that atleast one of them had a problem before I quit collecting.
This is good info ...
Is there a reason why you are only allowing a client to talk to 1 ap ? Can you please remove this and allow the AP roam ?
What are the AP channels? 1,6,11?
It never occurred to me to let the machines in range of the other AP roam as they have a solid signal from their intended AP, and the AP in the office is in my opinion a less reliable unit (Cisco SMB WAP200). Actually we originally used a WAP200 on our plant floor here, but we had more serious reliability issues (machines dropping clean off the network etc.) with those devices so they were demoted to simple boardroom duty for occasional use. I've also done side by side performance comparisons between the WAP200 and the AIR1242G and found the Cisco unit to provide a better transfer rate, and a much more solid link (WAP200 tx speeds bounce all over, and overall perform worse). So, long story short I didn't see any reason to allow the machines on the floor to roam to less reliable APs.
At our plant here the clients are really only in range of 1 AP which is the AIR1242G on the floor. We do have another WAP200 in the boardroom here as well, but it's out of usefull range.
That being said, the AIR1242G is on channel 8 while the WAP200 is on channel 2. I used my WiSpy to confirm that there was no other traffic around those channels when I chose them.
Ive been doing this for 10+ years. I would get back to basics my friend and lets start over. There seems to be "other stuff" going on and without being there it is difficult to take everything in with a few email exchanges.
1) Keep all the APs as the 1242s. Remove any other make or model to lessen client confusion
** I would also look at WDS config; check on Cisco site how to configure WDS. **
2) Dont restrict clients from roaming or locking them down to an AP. Almost no one does this and it can cause issue. Assume for a there is a noise floor lift and the client isnt allow to roam to an other AP even if it is further and can still maintain connection
3) Use only channel 1,6,11 (not inbetween channels)
4) AP power should be at 25mW or 50mW not 100mW or (100%). Your clients (most clients) can only communicate at 20-45 mW. You may have a hidden node issue
5) Update the firmware on the APs to the leatest
Hey George, thank you for the to-do list. It will take me a little time to make the suggested changes as I have a few projects on the go at the moment. I will report back as soon as possible.
Ok, I've had a chance to review your last post.
1) Each site has an AIR-1242G on the plant floor and a WAP200 in the office area.
At our Calgary plant, the WAP200 is within range of a couple of the plant floor clients. In Langley a site survey from either WAP will show the other one, but there is way too much obstruction between them for any usefull connection.
I haven't "restricted" the roaming of the shop floor clients in any fashion other than simply never joining them to the SSID for the WAP200.
I've read up on WDS and I don't see how it would apply to my situation as I effectivley have only a single AP at each site. Perhaps I am missing something? Again, I don't consider the WAP200's as available to the plant floor clients. For testing purposes I could disable these devices leaving only the single AIR-1242G unit in range of the clients.
1 & 2) Perhaps I have not been clear about the clients; they are stationary computers on the plant floor, not mobile workstations. Roaming would only occur if the AP were down for eg. and of course that would only apply if there were a 2nd AP available. We do have a few notebooks for the office staff, but those are only joined to the WAP200 in the offices at the moment.
3) Change made.... I was always sure to provide ample distance between the channels used (was 2 and 8 for eg.), but I have now moved the devices to one of channels 1,6,11 as suggested (of course not on the same channel).
4) AP Power; I've confirmed that the APs are currently set at 20dBm (100w) which as I understand it is the maximum allowable in North America. If I drop this I understand that I will improve the signal to noise ratio, but I will also be decreasing my range. I can do some experimentation, but I already have a few clients at each site that have auto-negotiated a 48Mb/s tx rate telling me that they are already seeing a reduced signal strength. Will cutting 3dBm out of the signal not make a big hit on Tx rate for the clients at the edge of the cell?
5) Seems like a simple task but apparently I do not have any sort of service contract and therefore cannot D/L IOS updates..... I will look into the cost to make that available.
I've attached a quick floorplan that I had on file for the layout at our Calgary plant. I dont have one for this plant at the moment but it could be done if it will be helpfull.
I appreciate the time you've spent on this so far and your patience with a novice like myself.
It's been some time but I thought I would update;
Since my earlier posts I have captured 802.11 control packets and not found anything unusual. I have also purchased the support agreement from Cisco for access to IOS upgrades and updated both of the WAPs to the latest version with no change in my symptoms.
I've discovered that is has mostly or entirely to do with that particular NIC and the drivers it uses. I've found that replacing the card with another make & model (I've had good results with D-Link WDA-2320) will prevent the repeated 4201 entries and short, frequent network interruptions. I have also found that using drivers from Ralink for that RT61 (RT2561) chipset will also reduce and nearly eliminate the problem. There is a tradeoff however with the Ralink drivers; I find that while general availability is much better and the number of 4201 entries is significantly reduced, the response time on a PING seems to increase over the drivers provided by Linksys. Overall though I would call it an improvement and have moved a few of the machines using that NIC over to the Ralink drivers for further monitoring.