A WLC 4402-25 running 220.127.116.11. Some LAP1131. Local DHCP Range configured on WLC for APs. Management and two ap-manager interfaces are in the same subnet. No LAG. Everything is working fine - APs gather DHCP from controller, join LWAPP, happy.
Now I have tried to upgrade to 18.104.22.168. Controller upgrade went fine. After reboot the APs connect with their old existing IP to the controller, get the new software successfully and reboot. After reboot the APs try to get DHCP addresses from WLC. But controller log says:
Tue Mar 23 10:22:13 2010: 00:18:ba:75:a3:78 DHCP received op BOOTREQUEST (1) (len 584, port 1, encap 0xec00)
Tue Mar 23 10:22:13 2010: 00:18:ba:75:a3:78 DHCP dropping packet from AP 00:18:ba:75:a3:78 received on port 1, vlan 16455
I can't find vlan 16455 in configuration, configured management vlan number is 71.
I have copied the original 22.214.171.124 config to another WLC in the Lab (4402-12) and used same AP as above. Have to change IP addresses in interface and DHCP configuration. No LAG but one active interface only. Upgrade to 126.96.36.199 went fine including APs getting DHCP after reboot:
Tue Mar 23 14:48:31 2010: 00:18:ba:75:a3:78 DHCP received op BOOTREQUEST (1) (len 584, port 1, encap 0xec00)
Tue Mar 23 14:48:31 2010: 00:18:ba:75:a3:78 DHCP received a REQUEST on 'management' interface from AP -- bouncing to local DHCP server.
Tue Mar 23 14:48:31 2010: 00:18:ba:75:a3:78 DHCP sending to local dhcp server (0.0.0.0:68 -> 10.1xx.x.xxx:1067, len 302)
I have compared both resulting 188.8.131.52 configs, but found differences in MACs, IPs and timestamps only.
Why is this controller dropping the DHCP packets in the life environment after upgrade? What is vlan 16455?
I have repeated the upgrade several times, with and without intermediary steps (4.1.xxx) - no change with the resulting 4.2 software.
Do you have DHCP proxy enabled or disabled? If you are using the internal DHCP server on the controller proxy has to be enabled. In 4.0 disabling proxy just meant that the DHCP server reported to the client was the actually DHCP server's IP instead of the controllers virtual interface IP. In 4.2 having proxy disabled means that the DHCP request is just broadcast out into the VLAN.
Yes, good idea. But no, didn't help.
DHCP proxy is enabled in both environments - the failing life controller and the working lab controller.
And neither disabling DHCP proxy nor re-enabling didn't change anything on the life controller. Same error messages always occur.
Maybe try if possible to erase the configuration, upgrade the FW and configure the WLC manually and not use a backup image. Could possibly be a corrupt config or image.
Just wondering if you tried (maybe you did) extend the AP to the same VLAN / SUBNET as the managment interface
to see if she gets an IP?
I would also perhaps try LAG just for testing purposes
Vlan 16455 --- i dont think ive ever come across this one myself... I did a google serach as well and didnt see anything pertaining to this either.
@ Scott: To do a clean install is an idea, but I have 8 other WLCs around the country. And I don't want to risk to run in the same trap before I have either a solution without hands on device or a clear reason why this device is the only faulty one.
I'm relatively sure that the sw image is not corrupt. It is working fine in lab and I have tried several times on life controller without any change.
But I will try to do a clean install with 4.2 and compare the final configuration with the faulty one. Maybe then I can see any difference.
@ George: The APs are already in the same vlan and subnet. Trying LAG is another idea, yes.
Thanks so far, I will come back if I have more results
It doens't mean when you upgrade each WLC you will have that issue, but last case senerio for troubleshooting is to upload the code and confgiure the basics manually and test. If it works, then you can configure everything else. Since you were able to take the code from on WLC to another and worked in a lab senerio, It looks like it might just be that one controller, especially if nothing changed in the network except for the WLC upgrade.
just for information - after contacting TAC we had a deep dive into different directions. Finally we have found a new bug in WLC software: CSCth31837.
Reason was that packets in this LAN are marked with an valid CoS value other than zero together with a valid vlan id in a valid 802.1q tag.
But software did calculate the vlan id from whole 802.1q tag including CoS bits resulting in vlan 16455.
Proven workaround now is to clear the CoS bits to zero for all packets travelling towards the WLC.
Thanks again for your contribution.
So is there a recommended 4.2 version of code without the bug?
I am having DHCP issues running a 3750 Integrated WLC and will be upgrading from 4.0 to 4.2 soon.
after identifying the reason last year the TAC engineer told me that Cisco is planning to fix that in version 7.0 and later only.
And checking the bug toolkit today - yes, there is only a 7.0 version listed as fixed.
I'm currently running a 6.0 version with the workaround of CoS = 0 for all packets sent to the WLC - as described above.