I have a sup720 65k. Today we had a server which had a NIC failure, and so we used another spare nic from the same server to get it back online. The same IP details were configured on the new NIC. This took about 30 minutes.
The problem was that when we connected the new nic to the switch the server was not pingable. When i checked the arp table it still had the old arp entry in its table for the faulty nic. I then cleared the arp entry for this IP and everything started working ok. The arp table correctly refreshed to point to the new nic mac-address.
Is this how arp process should work ? What if I had a dual connected server using only 1 IP for redundancy, would we still have to clear the arp entry before the failover worked ?
Your help will be greatly appreciated.
What is the aging time in your arp table ?
sh mac-address-table aging-time
As for the dual connected server it depends on how the failover is done ie. is it using virtual mac-address, virtual IP address etc.
In a lot of these situations when the server fails over it can do a gratuitous arp ie. nobody asked it it just resends it's mac-address onto the LAN segment and switches/routers will update with the new info.
As far as I know the default ARP cache timeout on Cisco devices is 4 hours. This can be seen with the "show interface
This value can of course be lowered, but you need to be aware of what you are doing as it will increase the frequency of ARP traffic across your switch.
Not sure how dual homing servers would act in the event of a NIC failure.
Sorry, Information above is for a router not a switch. Please ignore.
Normally, the ARP aging time in a router or a layer-3 switch is 4 hours. You can see it by doing show int vlan nn, on line 7 where it says "ARP type: ARPA, ARP Timeout 04:00:00".
Usually, with a server failover scenario, the new card will issue a gratuitous ARP, which will force the change in the router's ARP table. (BTW, as you can imagine, this forcing of the ARP table with gratuitous ARP is quite a security hazard.)
Unlike the MAC forwarding table, the ARP entries are not refreshed by normal traffic. They will each time out after 4 hours, independently of the traffic. After which the router will do another ARP next time it has to send a packet. If you decrease the ARP timeout, you will increase the background noise on the VLAN, but you will find any new NIC faster. It's a trade-off.
I would have expected the server to produce a gratuitous ARP as it was booted, and for that to do the trick in the router ARP cache. Was the server disconnected from the network when it was booted?
Also, do you have portfast on its switchport? Not having that could have prevented the switch from seeing the gratuitous ARP.
Yes I confirm we have the default arp table refresh timeout (4hours). Also we have portfast configured on all switchports to servers.
The server was never rebooted, we just simply moved the cable from one faulty nic to the other nic and the link light came UP.
The funny thing is we have dual connected NIC's in our network using only one IP for redundancy, and these have always worked fine. When one nic fails, the secondary takes over and the arp entry is refreshed within seconds.
Thanks for your help and input.
In that case, I reckon what happened was that when you unplugged the cable from the faulty NIC, the server tried to failover to the other NIC, and that NIC tried to generate a gratuitous ARP. But it couldn't, because it wasn't connected yet. Then you connected it, but it was too late ... it had already done its best for the gratuitous ARP.
If the standby NIC had been connected all the time but configured as a failover, it would have produced its gratuitous ARP and it would have failed over without any intervention.
Yes this makes sense to me.
I will maybe perform a test with the server team again to replicate what we did, but this time I will enable and diable the NIC to see if a gratuitous message is sent out.
Thanks alot for everyones help and input.
A small correction: when an entry in the ARP table times out the IOS does not wait till it has a packet to forward to refresh the ARP entry. The IOS will immediately send an ARP request when an entry times out, and if it receives a response (the device is still on line) then the IOS puts a new entry into the ARP table. You can verify this by running debug arp, wait till an entry times out, and you should see IOS immediately send the ARP request.
OK, thanks Rick. I didn't know that. I guess my site has such dense traffic from every host that I didn't notice the difference.
I guess it makes sense, because otherwise every 4 hours you would lose a packet or few. I notice that a router will discard outgoing packets while it is waiting for an ARP response.
In fact, to go further, I guess it actually does the ARP request slightly before the existing entry times out so as not to risk any gaps. Is that right?
My memory is that IOS times the entry out and then sends the ARP request. It would be logical to send the request before the timeout, but I do not think that is what IOS does.
OK, so there must be a few milliseconds hole during which traffic could be discarded, and during which the entry will be flagged as "Incomplete".
I have one more question on this issue.
Even though switch ARPs for the IP addresses in the cache, it only retains the mac addresses of it's own interfaces.It does not populate mac address for other entries.
I tried out in the following setup.
2950(.2 , IOS -12.1-22EA10)---(.1)PC
I issued debug arp and clear arp. The switch arps for both 1 and 2 but only retains entry for 10.1.1.2.
Can you help me in understanding this behavior ?
Can you clarify some things about your situation? Can you be specific about the address of the switch, can you be specific about the address of the PC, and can you be specific about about whether the switch has attempted access to the PC (ping to the PC or something like that)?
When you are dealing with layer 2 switches like the 2950 there is sometimes confusion about the ARP table on the switch and the mac-address-table on the switch. The switch will learn mac addresses of every connected devices and put them into the mac-address-table (so that it can forward at layer 2 to the device). But a layer 2 switch will not put the device IP address and MAC address into its ARP table unless the switch has made an attempt to contact the device. In this respect the layer 2 switch is quite different in behavior than a layer 3 router. The router learns MAC addresses from every device in the subnet and puts them into the ARP table (so that it can do layer 3 forwarding). But the ARP table on a layer 2 switch is only for devices which the management interface of the switch has attempted to communicate with.
Steps I followed in my setup.
1.Switch 2950 - 10.1.1.2--Vlan 1 interface
PC -- 10.1.1.1-connected to fa 0/1 of switch which is configured for access vlan 1
2.It created ARP entry for VLAN 1 SVI
3.Ping PC from the switch
4.It added ARP entry for PC
5.Turned on debugging
7.Switch arps for both vlan 1 SVI and PC ip address.
8.Retains ARP entry for vlan 1 SVI.
Your explanation answers for not adding ARP entry for the PC on a layer 2 switch.
Thanks for your time
Rick has already explained the reason. The PC needs to talk to Switch VLAn1 SVI to have its ip displayed agin in the Switch's ARP table.
When you clear the ARP in step 6, the entry is gone.
if you ping PCs IP address from the switch or VLAN1's IP from the PC, you will see ARP entry re-appear again. Switch 2950 is a L2 switch and will behave as explained by Rick.
but you will always see Mac-address of the PC in the switch's mac-address table for all that time when PC is active. This is the responsibility of a L2 Switch and keeping ARP table up-todate is the responsibility of L3 Switch/Router.