I am currently using a dual core setup with HSRP between two 6509 switches. When Core A goes down everything fails over to Core B quite gracefully. When Core A comes back online my Edge and WAN switch loses all of its EIGRP routes with the Core routers. If I do a shut / no shut on the interfaces on the Edge and WAN, EIGRP begins communicating again. Any ideas. Thanks
Core A has a higher priority
Using multiple VLAN interfaces
WAN and Edge switches are 4948 switches
Don't know I would check things like the 4948s default gateways are pointed at the virtual ip that your hsrp is using . You may also have to check things like spanning tree to see if you roots are where you think they should be and if any blocked ports are where you think they should be . Is hsrp in the correct state and do you have the preempt commands on the interfaces ? maybe you could post the configs and how they are connected together .
I believe the virtual IP of HSRP will be used as the default GW for user / PC. And all the edge an WAN are using EIGRP communicate each other.
In normal cases, the core A is the preferred path to WAN / Edge, but core B still maintain the routing table. If core A down, the traffic go into core B will follow the routing table and continue to communicate to WAN / Edge.
If the core A recovered, the HSRP may switch back to core A but I suspect the routing table between core A, WAN & edge may not be fully created.
Can you advise the fall-over time in HSRP ?
Please provide config. of core A/B, WAN & edge. And the network diagam how they connect each other.
Any direct connection between two core ? It will be useful if the link between core A, wan/edge not readied but HSRP fall-back to core A then all traffic will via core B to A then the user.
Hope this helps.
I am a little hestitant with sending my configs and diagram for security reasons. I have checked things like standby preempt, spanning-tree and my connection between the two cores. Everything looks good but I notice my EIGRP routes use the ip adds of the cores not the standby address but spanning-tree blocks the correct ports. Does this sound correct? Should EIGRP have the interfaces ip and not the standby address? Heres an example:
D 192.168.101.0/24 [90/3072] via 192.168.111.142, 10:08:43, Vlan501
[90/3072] via 192.168.111.141, 10:08:43, Vlan501
instead of the standby address: 192.168.111.137.
I am assuming it is considering span-tree is blocking the ports. By the way it is spanning tree MST.
Here is the syslog message I am receiving when this prob occurs which only happens on the edge and wan switches, adjacencies are fine between the cores:
11/9/2006 2:18:29 PM 5 Notice %DUAL-5-NBRCHANGE: IP-EIGRP 100: Neighbor 192.168.111.138 (Vlan500) is down: retry limit exceeded
11/9/2006 2:17:09 PM 5 Notice %DUAL-5-NBRCHANGE: IP-EIGRP 100: Neighbor 192.168.111.138 (Vlan500) is up: new adjacency
11/9/2006 2:17:07 PM 5 Notice %DUAL-5-NBRCHANGE: IP-EIGRP 100: Neighbor 192.168.111.138 (Vlan500) is down: retry limit exceeded
11/9/2006 2:15:47 PM 5 Notice %DUAL-5-NBRCHANGE: IP-EIGRP 100: Neighbor 192.168.111.138 (Vlan500) is up: new adjacency
11/9/2006 2:15:45 PM 5 Notice %DUAL-5-NBRCHANGE: IP-EIGRP 100: Neighbor 192.168.111.138 (Vlan500) is down: retry limit exceeded
Thanks for the help. If it is okay I will send my configs to your personal address. Thanks
EIGRP uses the ip addresses attached to the real physical interfaces not the HSRP address. HSRP is primarily for workstations/servers and the default gateway of switches etc for management but not for the routing of traffic.
neighbour up/neighbour down suggests a flapping link.
How are you connecting your edge switches. Not used the 4948 but certainly with IOS on 6500 you can create routed ports. Can you not connect the 4948 switches using routed ports which takes spanning-tree out of the equation. It's difficult to say without knowing the topology how you are using your vlans.
Also what is the addressing for vlan 500 and vlan 501 ?
Jon is entirely correct that it is normal for EIGRP to use the physical interface address rather than the HSRP address as the source in its routing updates. The EIGRP updates come from a particular neighbor and it needs to be clear and unambiguous which neighbor sent the update. It is likely that each neighbor will have unique paths to certain destinations. Using the HSRP address would make it ambiguous which particular device had the route to certain destinations.
And I think that Jon is correct that the log messagges (especially the ones with retry exceeded and then followed by neighbor up) indicate that some traffic is being blocked.
I do believe the problem is spanning tree when the cores failover. Problem is I cannot test the failover process whenever I feel like it, Attached is a diagram of the setup. I am interested in trying routed ports at the edge and WAN but I have a firewall services module that is based on VLANS. As long as I can have the port coming into my cores on a specific VLAN, using routed port I will be able to still pass the traffic through the firewall module. I appreciate the help. Thanks
I apologize if this is a stupid question but when using routed port on my edge switch how does the edge know which port to use? Currently the one that runs to Core B is blocked by spanning tree so all traffic goes to Core A. Routed port takes spanning tree out of the equation so how does that work. Do you control your routes using static routes? Thanks
If you have routed ports as uplinks then each link has a separate IP subnet ie in your example you would need 2 separate subnets (usually /30) for each 4948 switch one for the link to Core A and one for the link to Core B.
if the links are the same speed etc. EIGRP will use both links as equal cost and forward traffic over both links. If one link fails at the most you will lose one packet, much faster than even RPVST+.
However in your scenario you can't do this as you have FWSM's in the path. I'm assuming they must be in transparent mode which i have not done before. This would mean a separate context per link from each 4948 switch and not per 4948 switch and i'm not sure this would work properly (without testing) due to stateful issues on the FWSM's and routing.
Have you checked what is happening on the FWSM when you failover the switch ie. if you are losing EIGRP routes can you still ping vlan 500 interface on Core B from the 4948 switch ?
Any reasoning in why you say I have a ARP or MAC issue? I believed the same thing because of different issues that were occuring a while back. I had users that would log in during the middle of the day and it seemed that all of their traffic was going to Core B rather than A. Other users where not affected. When I rebooted B the issues went away.
I've seen similar issues in the past with HSRP.
THe best way to test it would be to fail core A, Restore it and then clear the arp cache on both routers, If that doesn't clear the problem clear the MAC table (CAM table if your running CatOS)as well.
If either helps then possibly setting HSRP to use BIA may help solve the issue.