09-11-2009 07:56 AM - edited 03-06-2019 07:41 AM
We recently upgraded a Catalyst 4506 switch from
cat4000-is-mz.121-13.EW.bin to
cat4500-entservices-mz.122-53.SG.bin
The switch has multiple VLAN's with catalyst 3550 switches connected to it.
We have numerous Linux, Windows, Printers that require DHCP to boot. The switch has a helper address to forward those requests to a machine running ISC's DHCP version 3.x server.
That worked previously and continues to work now. Devices on all VLANs whether directly connected to the 4506 or connected to a 3550 continue to receive DHCP replies.
In addition we have some embedded systems that also use DHCP though it's possible it's technically a bootp request. Those systems no longer receive replies even though devices on the same network and 3550 secondary switch do receive replies.
We can see in the DHCP logs that the requests for these boards are received and historically valid replies are sent back.
Has their been some change in the 122-X train how forwarding of DHCP or bootp replies (not requests which work fine) are handled ?
From what we can see the boards are either not receiving the replies or the replies are wrapped in such a way that they can't be successfully unpacked.
This environment is very remote, we're working toward getting better information through packet dumps etc but the time frame for accomplishing that is on the order of when we'd have to decide to revert the IOS.
Thank you very much in advance for any insight.
James Robnett
ps: I accidentally posted this in the WAN section originally. Apologies for the semi-dupe.
09-11-2009 09:19 AM
Is "service dhcp" enabled after upgrading?
09-11-2009 10:07 AM
No it's not at least explicitly set. The switch only has a helper address which is working fine. The DHCP server is receiving the request and sending the reply. Most devices continue to work just fine.
We have some evidence that the boards are actually receiving the replies but the netmask portion of the packet is munged.
Wireshark gives the following error for the reply the board recieved:
Option: (t=1,l=3) Subnet Mask - length isn't 4
and the Value is 06FF00
Everything else looks fine.
It appears that the switch is somehow munging the netmask portion. I say that because all other devices boot just fine.
I'm assuming this field is munged for all devices and only these boards actually care.
09-11-2009 10:13 AM
can you compare if there is any difference in the dhcp request packet between the working device and that embedded devices? especially "giaddr" field.
09-11-2009 10:53 AM
We see that these boards, if on the same VLAN as the DHCP work just fine, the implication is the process of forwarding the DHCPOFFER is munging up the netmask field. When they're on the same VLAN the entire process is switched, no layer3 routing/forwarding occurs.
I've been on the phone with CISCO TAC the entire time, the agree it appears to be in the IOS but are equally stumped.
Still working on packet dumps from functioning DHCP clients that aren't one of these boards, but non-functioning boards can be made functioning if moved the DHCP server VLAN.
09-11-2009 04:33 PM
We now have a clear indication that its the IOS. It's not accurate to say the IOS munges up the various fields in the DHCP reply.
They're being modified in a way that's different than in the older version. The Nucleus OS on these boards can't parse the final DHCP offer but modern DHCP spec compliant OS's can.
Its unclear whether Cisco is not fully backwards compliant with older DHCP specs or whether the Nucleus OS was only partially compliant with DHCP specs such that it worked with the old IOS but not the new.
In any event it's doubtful there's a simple config solution so this post can simply die.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide