Hello NetPro gurus!
I am currently troubleshooting an issue we are having with our Guest (completely open) WLAN in which it seems certain clients are losing their layer 3 connectivity while staying 'connected' to the LWAP(s). These certain clients lose their layer 3 configuration and are not able to access internal or external resources until they disable/enable their wireless connection.
I specifically have this problem, and it's only on the Guest WLAN that this occurs. I am using a Lenovo T61 with an Intel 4965AG internal wireless chipset. I know this chipset is relatively new and I have tried multiple drivers, all with the same result. Not all machines have this issue. MacPro laptops do not seem to have this issue nor do machines with Intel Pro 2200BG chipsets. I tested with a Netgear PCMCIA card and did not have this issue either.
Here's some more background information:
We have 5 WLCs (2 WiSM blades each in a Catalyst 6509 and 1 WLC 4402) and 7 WLANs. The 4 WiSM controllers have each WLAN configured on it, and the 4402 WLC only knows about one Guest wireless network (it is a completely open WLAN i.e. no security). This is the particular network we see this issue with. We have approximately 200 LWAP 1131AG's (47 in one building, 154 in another) all broadcasting the Guest SSID. Our server core Catalyst 6509's each have seperate VLANs (with Port-channels in them) for the WiSM blades. The Guest WLC 4402 is in the DMZ in its own VLAN. Each WLC is providing DHCP for each of the WLANs.
The issue that seems to be occuring is the fact that during our troubleshooting I lose all layer 3 connectivity. I continue to stay "connected" to the AP and signal strength is excellent however my continuous pings to the Guest WLC (192.168.0.x network) time out and I cannot get out to the Web. I noticed the following error on my laptop (Lenovo T61 w/ an Intel 4965AG wireless chipset) in the system event viewer:
The system detected that network adapter Intel(R)...Link 4965AG - Packet Scheduler Miniport was disconnected from the network, and the adapter's network configuration has been released. If the network adapter was not disconnected, this may indicate that it has malfunctioned. Please contact your vendor for updated drivers.
This occured at the exact time I lost my layer 3 connectivity. A co-worker and I did some research and determined that this was exactly one half of the way through my 1-hour DHCP lease from the Guest WLC (the 4402). The DHCP leases are set to expire at 1 hour as we have a lot of clients on the Guest WLAN that come and go and only have one network configured for the Guest WLAN w/ 229 available IP's to be handed out. We were wondering if it was an issue with the DHCP renewal process from the WLC. This does not occur on the Internal WLANs configured with strict authentication security.
We tested with a few machines, such as an Apple laptop, an older laptop with an Intel Pro 2200BG chipset, and even my same laptop with a Netgear PCMCIA WiFi card none of which exhibited this problem. Connectivity at layer 3 was not interrupted. I have tried multiple drivers as well, all with the same result.
Now, we are not sure if it is an issue with the WLC itself or a chipset issue. The Intel 4965AG chipset is rather new but we have a lot of WLAN clients with this chipset on the network. That also doesn't explain why this issue ONLY occurs on the Guest WLAN.
We were thinking of placing a small DHCP server on the network to take over DHCP responsibilities from the Guest WLC to see if that makes a difference. Another idea we had was to increase the DHCP scope to two Class B networks (18.104.22.168 - 22.214.171.124 /23 to give us 510 hosts so we can extend the DHCP lease time).
I plan on doing further testing today by placing a few more machines on the Guest WLAN with multiple chipsets and taking note of which ones exhibit the problem.
Any and all help is MUCH appreciated. Thanks!
Well.. to complete a nice happy ending to my saga.. BUG FOUND!
I opened a TAC case and we came to this conclusion.
In the Advanced settings of the WLAN there is a client time-out default of 1800 seconds.
The clients were dis-associating due to inactivity according to the sniffer traces, causing the dhcp process to kick off and the web_auth reqd state.
We set this down to 60 seconds and watched it over and over..
I have now set it to the max allowed of 65535 (18 hours) as a work around.
Cisco admitted there are bugs when setting this to 0, so they suggest the 65535.
Hope this info helps some of you out!!