DHCP forwarding issue with 122-53.SG

jrobnett · ‎09-11-2009

We recently upgraded a Catalyst 4506 switch from

cat4000-is-mz.121-13.EW.bin to

cat4500-entservices-mz.122-53.SG.bin

The switch has multiple VLAN's with catalyst 3550 switches connected to it.

We have numerous Linux, Windows, Printers that require DHCP to boot. The switch has a helper address to forward those requests to a machine running ISC's DHCP version 3.x server.

That worked previously and continues to work now. Devices on all VLANs whether directly connected to the 4506 or connected to a 3550 continue to receive DHCP replies.

In addition we have some embedded systems that also use DHCP though it's possible it's technically a bootp request. Those systems no longer receive replies even though devices on the same network and 3550 secondary switch do receive replies.

We can see in the DHCP logs that the requests for these boards are received and historically valid replies are sent back.

Has their been some change in the 122-X train how forwarding of DHCP or bootp replies (not requests which work fine) are handled ?

From what we can see the boards are either not receiving the replies or the replies are wrapped in such a way that they can't be successfully unpacked.

This environment is very remote, we're working toward getting better information through packet dumps etc but the time frame for accomplishing that is on the order of when we'd have to decide to revert the IOS.

Thank you very much in advance for any insight.

James Robnett

ps: I accidentally posted this in the WAN section originally. Apologies for the semi-dupe.

Yudong Wu · ‎09-11-2009

Is "service dhcp" enabled after upgrading?

jrobnett · ‎09-11-2009

No it's not at least explicitly set. The switch only has a helper address which is working fine. The DHCP server is receiving the request and sending the reply. Most devices continue to work just fine.

We have some evidence that the boards are actually receiving the replies but the netmask portion of the packet is munged.

Wireshark gives the following error for the reply the board recieved:

Option: (t=1,l=3) Subnet Mask - length isn't 4

and the Value is 06FF00

Everything else looks fine.

It appears that the switch is somehow munging the netmask portion. I say that because all other devices boot just fine.

I'm assuming this field is munged for all devices and only these boards actually care.

Yudong Wu · ‎09-11-2009

can you compare if there is any difference in the dhcp request packet between the working device and that embedded devices? especially "giaddr" field.

jrobnett · ‎09-11-2009

We see that these boards, if on the same VLAN as the DHCP work just fine, the implication is the process of forwarding the DHCPOFFER is munging up the netmask field. When they're on the same VLAN the entire process is switched, no layer3 routing/forwarding occurs.

I've been on the phone with CISCO TAC the entire time, the agree it appears to be in the IOS but are equally stumped.

Still working on packet dumps from functioning DHCP clients that aren't one of these boards, but non-functioning boards can be made functioning if moved the DHCP server VLAN.

jrobnett · ‎09-11-2009

We now have a clear indication that its the IOS. It's not accurate to say the IOS munges up the various fields in the DHCP reply.

They're being modified in a way that's different than in the older version. The Nucleus OS on these boards can't parse the final DHCP offer but modern DHCP spec compliant OS's can.

Its unclear whether Cisco is not fully backwards compliant with older DHCP specs or whether the Nucleus OS was only partially compliant with DHCP specs such that it worked with the old IOS but not the new.

In any event it's doubtful there's a simple config solution so this post can simply die.