cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7600
Views
19
Helpful
50
Replies

Clients on Guest WLAN Losing Layer 3 Connectivity

s.clinard
Level 5
Level 5

Hello NetPro gurus!

I am currently troubleshooting an issue we are having with our Guest (completely open) WLAN in which it seems certain clients are losing their layer 3 connectivity while staying 'connected' to the LWAP(s). These certain clients lose their layer 3 configuration and are not able to access internal or external resources until they disable/enable their wireless connection.

I specifically have this problem, and it's only on the Guest WLAN that this occurs. I am using a Lenovo T61 with an Intel 4965AG internal wireless chipset. I know this chipset is relatively new and I have tried multiple drivers, all with the same result. Not all machines have this issue. MacPro laptops do not seem to have this issue nor do machines with Intel Pro 2200BG chipsets. I tested with a Netgear PCMCIA card and did not have this issue either.

Here's some more background information:

We have 5 WLCs (2 WiSM blades each in a Catalyst 6509 and 1 WLC 4402) and 7 WLANs. The 4 WiSM controllers have each WLAN configured on it, and the 4402 WLC only knows about one Guest wireless network (it is a completely open WLAN i.e. no security). This is the particular network we see this issue with. We have approximately 200 LWAP 1131AG's (47 in one building, 154 in another) all broadcasting the Guest SSID. Our server core Catalyst 6509's each have seperate VLANs (with Port-channels in them) for the WiSM blades. The Guest WLC 4402 is in the DMZ in its own VLAN. Each WLC is providing DHCP for each of the WLANs.

The issue that seems to be occuring is the fact that during our troubleshooting I lose all layer 3 connectivity. I continue to stay "connected" to the AP and signal strength is excellent however my continuous pings to the Guest WLC (192.168.0.x network) time out and I cannot get out to the Web. I noticed the following error on my laptop (Lenovo T61 w/ an Intel 4965AG wireless chipset) in the system event viewer:

Description:

The system detected that network adapter Intel(R)...Link 4965AG - Packet Scheduler Miniport was disconnected from the network, and the adapter's network configuration has been released. If the network adapter was not disconnected, this may indicate that it has malfunctioned. Please contact your vendor for updated drivers.

This occured at the exact time I lost my layer 3 connectivity. A co-worker and I did some research and determined that this was exactly one half of the way through my 1-hour DHCP lease from the Guest WLC (the 4402). The DHCP leases are set to expire at 1 hour as we have a lot of clients on the Guest WLAN that come and go and only have one network configured for the Guest WLAN w/ 229 available IP's to be handed out. We were wondering if it was an issue with the DHCP renewal process from the WLC. This does not occur on the Internal WLANs configured with strict authentication security.

We tested with a few machines, such as an Apple laptop, an older laptop with an Intel Pro 2200BG chipset, and even my same laptop with a Netgear PCMCIA WiFi card none of which exhibited this problem. Connectivity at layer 3 was not interrupted. I have tried multiple drivers as well, all with the same result.

Now, we are not sure if it is an issue with the WLC itself or a chipset issue. The Intel 4965AG chipset is rather new but we have a lot of WLAN clients with this chipset on the network. That also doesn't explain why this issue ONLY occurs on the Guest WLAN.

We were thinking of placing a small DHCP server on the network to take over DHCP responsibilities from the Guest WLC to see if that makes a difference. Another idea we had was to increase the DHCP scope to two Class B networks (191.168.0.0 - 191.168.1.255 /23 to give us 510 hosts so we can extend the DHCP lease time).

I plan on doing further testing today by placing a few more machines on the Guest WLAN with multiple chipsets and taking note of which ones exhibit the problem.

Any and all help is MUCH appreciated. Thanks!

Shane

1 Accepted Solution

Accepted Solutions

Well.. to complete a nice happy ending to my saga.. BUG FOUND!

I opened a TAC case and we came to this conclusion.

In the Advanced settings of the WLAN there is a client time-out default of 1800 seconds.

The clients were dis-associating due to inactivity according to the sniffer traces, causing the dhcp process to kick off and the web_auth reqd state.

We set this down to 60 seconds and watched it over and over..

I have now set it to the max allowed of 65535 (18 hours) as a work around.

Cisco admitted there are bugs when setting this to 0, so they suggest the 65535.

Hope this info helps some of you out!!

View solution in original post

50 Replies 50

Scott Fella
Hall of Fame
Hall of Fame

I have seen this issues also with various other Intel cards. The fix was to change around the drivers or if you are using a 3rd party utility like Access Connections on your IBM. I had to upgrade my driver a couple of times and also the utility. Since you narrowed it down to certain wireless cards, its not your wlan. Make sure you use the driver that the laptop manaufacture recommends and not the latest Intel. I have had issues using the latest Intel from their website. I use Access Conections now and no problems at all. The fix before was to remove the 3rd party utility and just use windows to configure the wifi.

-Scott
*** Please rate helpful posts ***

Thanks for the information. Doesn't it seem a little odd that this only happens on our guest WLAN? I would think that if it indeed was a driver issue, I/we would see this on any of the WLANs that exist. For reasons I have yet to determine, I think it has something to do with the WLC controlling the guest WLAN, WLC 4402 running software version 4.2.130.0. Our WiSM blades are running 4.2.130.0 as well and the WCS is on version 5.0.56.2. I can paste any relevant configuration of the Guest WLAN if that might help.

Do you notice if it occurs only on certain ap's or on a certain wlc on the WiSM? Post your show run-config on the dmz and on your wlc. This way we can eliminate issues with configuration.

-Scott
*** Please rate helpful posts ***

It seems to occur when associated to any access point broadcasting the Guest SSID. I want to test my theory of the DHCP issue(s) before submitting all of the configs, just to see if my theory is correct. I think it's doing something funky though debugs on the WLC didn't show anything interesting. I noticed that when the problem occured just a little while ago (while testing) I could release my IP but it would not renew.

What dhcp are you using. The wlc would work the best for dhcp on the guest controller in the DMZ.

-Scott
*** Please rate helpful posts ***

I was thinking of an external Linux DHCP server configured with the same scope on the same subnet. This, obviously, would not actually allow me to determine the true problem though. I'll have the configurations posted in a little while. I am not going to implement this today, so we can talk more.

Thanks for your time.

I've attached the run-config from the Guest WLC which is attached to our DMZ switch. Did you want to see the configuration of the DMZ switch as well?

Did you try to change the lease time or manually do "ipconfig /renew" on the client?

You can also run "debug client enable"on the controller and follow the dhcp requests.

Hello, thanks for the reply. When this issue occurs, I am able to release the IP bind information but am not able to renew unless I disable/enable the wireless card in the machine. So far, this occurs on only one Intel chipset that I can tell. I am performing further testing at this time, it might be a WZC service issue, so I am trying to rule that out. I have enabled the client debugging as you have suggested so I will let everyone know what happens.

Thanks!

If you are using WZC, make sure you uninstall any orther 3rd party utility like Intel Proset or IBM Access Connections. This has been an issue with the wireless card driver version. I actually use the Access Connections to manage my wireless profiles. Upgrading the utility and the driver solved my issue. It took a while until I found a driver that actually fixed the issue. Seem like it corrupts the tcp stack. I wasn't able to even ping my own ip when i lost connectivity, but still seemed connected.

-Scott
*** Please rate helpful posts ***

Can you tell me what version of driver you are using for the 4965AG? I've got 11.5 on my machine, but even version 11.1 exhibits the same problem.

I had issues with my 3945ABG which now I have 11.5.0.36. I have some co-workers that have issues with their 4965AG... they just end up rebooting the machine. There is a new code in Intel's website v12. Don't know if it is supported by the laptop manufacturer though.

-Scott
*** Please rate helpful posts ***

It doesn't look like Lenovo (T61) has posted ant 12.x driver software yet. I was able to debug the problem on the WLC, though I have no idea what the messages really mean. It logged the BOOTREQUEST message when this particular client was half way through its DHCP lease. No other clients I am testing with are having an issue. I went through and tested without using WZC, and had the same problem. I then uninstalled ALL third-party wireless software, re-enabled WZC and rebooted. I am testing again. The unfortunate part is that I have to wait 30 minutes until I can tell if there is a difference.

well, it doesn't appear to matter what is controlling the wireless connection. I received the following debug message for both the 3945ABG and 4965AG wireless chipsets:

(WLC5) >Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP received op BOOTREQUEST (1) (len 334, port 29, encap 0xec00)

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP selecting relay 1 - control block settings:

dhcpServer: 192.168.0.4, dhcpNetmask: 255.255.255.0,

dhcpGateway: 192.168.0.1, dhcpRelay: 192.168.0.4 VLAN: 0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP selected relay 1 - 192.168.0.4 (local address 192.168.0.4, gateway 192.168.0.4, VLAN 0, port 29)

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP transmitting DHCP REQUEST (3)

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP op: BOOTREQUEST, htype: Ethernet, hlen: 6, hops: 1

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP xid: 0xce34deed (3459571437), secs: 0, flags: 0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP chaddr: 00:13:02:24:ca:77

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP ciaddr: 192.168.0.164, yiaddr: 0.0.0.0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP siaddr: 0.0.0.0, giaddr: 192.168.0.4

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP selecting relay 2 - control block settings:

dhcpServer: 192.168.0.4, dhcpNetmask: 255.255.255.0,

dhcpGateway: 192.168.0.1, dhcpRelay: 192.168.0.4 VLAN: 0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP selected relay 2 - NONE

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP received op BOOTREPLY (2) (len 548, port 0, encap 0x0)

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP sending packet in EoIP tunnel to foreign 10.50.111.11 (len 346)

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP transmitting DHCP ACK (5)

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP op: BOOTREPLY, htype: Ethernet, hlen: 6, hops: 0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP xid: 0xce34deed (3459571437), secs: 0, flags: 0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP chaddr: 00:13:02:24:ca:77

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP ciaddr: 192.168.0.164, yiaddr: 192.168.0.164

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP siaddr: 0.0.0.0, giaddr: 0.0.0.0

Mon Aug 4 13:17:46 2008: 00:13:02:24:ca:77 DHCP server id: 2.2.2.2 rcvd server id: 192.168.0.4

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 Clearing Address 192.168.0.164 on mobile

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 192.168.0.164 RUN (20) Change state to DHCP_REQD (7) last state RUN (20)

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 192.168.0.164 DHCP_REQD (7) pemAdvanceState2 3850, Adding TMP rule

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 192.168.0.164 DHCP_REQD (7) Replacing Fast Path rule

type = Airespace AP - Learn IP address

on AP 00:00:00:00:00:00, slot 0, interface = 29, QOS = 0

ACL Id = 255, Jumbo Frames = NO, 802.1P = 0, DSCP = 0, TokenID = 5006

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 192.168.0.164 DHCP_REQD (7) Successfully plumbed mobile rule (ACL ID 255)

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 apfMmProcessCloseResponse (apf_mm.c:427) Expiring Mobile!

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 0.0.0.0 DHCP_REQD (7) Deleted mobile LWAPP rule on AP [00:00:00:00:00:00]

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 Deleting mobile on AP 00:00:00:00:00:00(0)

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 Set bi-dir guest tunnel for 00:13:02:24:ca:77 as in Export Anchor role

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 192.168.0.164 Added NPU entry of type 9

Mon Aug 4 13:17:55 2008: 00:13:02:24:ca:77 0.0.0.0 Removed NPU entry.

This occurs exactly halfway through the DHCP lease.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: