cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2167
Views
0
Helpful
45
Replies

Failover routing design help Needed

Hello.

We are looking to have a setup like this:

                                                      User PCs

                                                            |

                                                            |

                                                            |

                                                          3750x

                                                  (stack - ip base)

                                                    /                  \

                                                   /                    \

                                                  /                      \

                     Servers---------------3750x ---------------- 3750x---------------- Servers

                                            (stack -              (stack -

                                          ip services)         ip services)

                                                 |                         |

                                                 |                         |

                                                 |                         |

                                                 |                         |

                                            Router                 Router

                                                 |                         |

                                                 |                         |

                                                 |                         |

                                              ISP1                   ISP2

We would like to have routing (and vLans) done on the switches, and have internet failover from ISP1 to ISP2 if ISP1 fails, and go back to ISP1 when it comes back up. Trunks between all switches. We also would like to have all devices on the same vLAN if possible.

What is the best approach to do this?

(Note that left and right sides [in brown and green font] are in separate site locations, and that user end [in red font] switches only have ip base, which limits eigrp functionality.)

We tried following this, but doesn't fit our site exactly:

http://www.geekmungus.co.uk/cisco-and-networking/failoverinternetconnectionusingipslatrackingandeigrproutingforinter-sitelinks

(Also ran into issue where switch in the middle would have two routes to internet - so possible issue with priority routes)

Thanks in advance

45 Replies 45

A redesign would be fine. We have free range to change everything.

Let me answer your questions:

1) is 10.10.20.0/24 the subnet used for clients and servers ?

Yes. This is the subnet for clients and servers.

2) if it is is there any reason why you want that one subnet across all sites. You can if you want i just need to know the reasoning eg there may be servers at each 3750_1/2 site that need L2 adjacency to work properly

The reason is that there will be some VMware server replication across those 2 sites, so we need to have servers on the same vlan for the setup to work. (other vlans for management between those 2 sites will also need to be added later)

3)  how is that etherchannel configured on 3750_3 ie. if the physical links are going to two separate switch stacks then it cannot be an etherchannel.

Not sure.

UserSite#sh etherchannel summary

Flags:  D - down        P - bundled in port-channel

        I - stand-alone s - suspended

        H - Hot-standby (LACP only)

        R - Layer3      S - Layer2

        U - in use      f - failed to allocate aggregator

        M - not in use, minimum links not met

        u - unsuitable for bundling

        w - waiting to be aggregated

        d - default port

Number of channel-groups in use: 1

Number of aggregators:           1

Group  Port-channel  Protocol    Ports

------+-------------+-----------+-----------------------------------------------

1      Po1(SD)         LACP      Fa1/0/47(I) Fa1/0/48(I)

UserSite#

4) what feature set is on 3750_3

Not sure. All I can remember off-hand is that it is IP Base. How can I get the info you need from cli?

5) if you try this command "sh ip route 0.0.0.0 0.0.0.0 ?" is there a "permanent" option keyword

No. There is not.

6) what are the default gateways set to on the clients/servers. Is it the same or does it differ per site. I suspect if vlan 10 is the client/server vlan then each site uses a different default gateway.

You're right.

In site A, Default gateway is pointing to 3750_1. In site B, Default Gateway pointing to 3750_2. In user site, the gateway is pointing to 3750_3.

Message was edited by: support systemsGo

Updated answers to questions above.

I don't think that is acting as an etherchannel ie. it is showing down, because you cannot run the same etherchannel across two separate switch stacks as far as i know. And at the IP Services end you have only used one port for the connection to 3750_3.

What is confusing is that one of those links should be blocking due to STP which means all traffic from the user site to get to any servers always takes the same path ie. if the link from the 3750_3 to 3750_2 is blocked then all traffic to servers connected to 3750_2 go via 3750_1. This is not very effiicient.

The only way to check this is to do a "sh spanning-tree vlan 10" on all switch stacks and then we would know for sure.

The main problem is that if you want the servers in the same vlan between the two sites you will not be able to get this working efficiently. The only solution would to have something like a pair of 4500 switches running VSS with a 4500 in each site and then you could run an etherchannel from 3750_3 to both switches. But you have two separate switch stacks so this doesn't work.

You could make the links L3 routed links from the 3750_3 ie. vlan 10 does not extend to the 3750_3 so the clients are in a different subnet but you still would not get optimal paths because then 3750_3 would see two equal cost paths to vlan 10 and sometimes traffic going to a server connected to 3750_2 would go via 3750_1 and vice versa.

There is basically no real way to make this a clean design with the equipment you have and the server subnet extended between the two sites.  You are going to have to accept that no matter what we do in terms of the default route that client to server traffic for one of the sites could go via the other site. The only way to fix this with the current equipment you have would be to use different vlans/IP subnets for the servers in each site and you say you cannot do this.

That said we may be able to get the default route working properly. I suspect vlan 10 is blocking somewhere in your setup and this could well be affecting the default route.  I think this is what the previous engineer was coming up against. If vlan 10 is blocking on an uplink then no amount of delays etc. would affect the path taken because it can only use the active link ie. there is only one path to take from the 3750_3 to the IP Services stack.

The solution would be to not extend vlan 10 to the 3750_3 stack and use L3 routed links between 3750_3 and the IP Services stacks. Then the 3750_3 would see two equal cost paths and we could use an offset list on 3750_2 or a delay on 3750_3 to make it the less preferred link. This would then mean the 3750_3 stack could then send traffic to the correct IP Services stack (depending on which ISP link is active) direct without having to go via the other stack.

But there is still some guesswork here so i'm afraid i need some more outputs -

Can you post from each switch -

1) "sh spanning-tree vlan 10"

2) "sh ip route"

3) "sh cdp neighbors"

4) "sh int trunk"

And as i say this will only fix the default route, the client to server traffic is not really fixable the way it is setup.

Apologies for all the output requests but this is proving more complicated than it first appeared and i need them to be sure what i think is happening is actually happening and also to suggest a solution that will actually work for you.

Edit - if the servers in site 2 are only as backup then as long as the STP blocked link was the connection from 3750_3 to 3750_2 then your current setup actually works okay. But this means that if we left vlan 10 extending to 3750_3 then even when ISP1 was down traffic to the internet would go via 3750_1 to 3750_2 because the direct link to 3750_2 is blocked. This may be acceptable to you. If not then if we use L3 routed links for the uplinks from 3750_3 then we need to apply an offset list/delay not just for the default route but also vlan 10 as well so that traffic always went to 3750_1 for server traffic. (assuming 3750_1 was the stack that had the non backup servers connected to it).

So it would help me if you could say whether the servers in one site are acting as backup for the servers in the other site. Reconfiguring the upinks to be L3 etc. would require downtime. You may not want this if you are prepared to accept internet trafffic not taking the most direct route from 3750_3. It is a tradeoff basically between optimal paths for the default route vs leaving it as it is an accepting everything ie. client to servrer and internet traffic always goes via one IP Services stack whether or not that stack's ISP is actually up. Internet should still work but it just won't route optimally.

Edit2 - to see what feature set is running on 3750_3 do a "sh ver"

Jon

Hello Jon.

I'd just like to again say thanks for helping us out with this.

I will get the output for the commands in a few hours, but we are ok with tearing down the whole setup, and with reconfiguring the uplinks as Layer 3.

We can have as much downtime as needed.

No problem

Once you have the outputs i'll have a good luck to see exactly how it is all working.

Can you confirm whether -

1) the servers at both sites are used all the time or whether servers at one site are just backing up the others ?

2) i know you need the same server subnet but you would be happy to readdress the clients in the user site ?

If so do you use DHCP for the client addresses and if you do what is the IP address of your DHCP server(s) ?

Edit - if possible can you put all of the outputs in a document and then add as an attachment - helps to keep the thread manageable.

To attach if you click on reply along the top is a "Use advanced editor", use that and you can add attachments.

No problem if you can't, just post the outputs into thread.

Jon

Hello Jon,

Attached is the output for the commands for each site.

Below, I hope I've answered your questions correctly. Basically, we are prepared for a complete reconfiguration at this point.

1) the servers at both sites are used all the time or whether servers at one site are just backing up the others ?

The servers in SiteB (3750_2) are just replicated copies of the servers in SiteA (3750_1). However, there always needs to be an active connection between SiteA and SiteB. Only time that the UserSite(3750_3) needs to access servers in SiteB is when SiteA is completely down.

However, if the servers in SiteA (3750_1) can still be reached through via site SiteB (3750_2) when the direct connection between the UserSite(3750_3) and SiteA (3750_1) is down, that route should also be available (if possible).

2) i know you need the same server subnet but you would be happy to readdress the clients in the user site ?

Yes. We have full pwermission to re-address clients and servers

If so do you use DHCP for the client addresses and if you do what is the IP address of your DHCP server(s) ?

The current IP address for the DHCP servers are 10.10.20.30 and 10.10.20.35. We can change these as well.

Hi Jon,

This is excellent. Please see my answers below.

1) on 3750_3 you said in a previous post the clients use the vlan 10 interface IP on 3750_3 as their default gateway. But there are no default routes in the routing table on 3750_3 so how does internet work at the moment ?

We took the intenet routers offline for the moment. So it wouldn't show in the routing table. Users are using mobile wireless devices for now.

2) the 192.168.50.32/29 and 192.168.50.40/29 networks. They don't seem to be doing anything ? I think they were meant to be for the uplinks from 3750_3 but they haven't been used for this.

I think that was the plan of the engineer who left, but wasn't implemented.

3) the IP services stack connections to the routers. Do you know what IP subnet is used for this. It looks like it is vlan 10 again but can you confirm ie. what are the routers LAN interface IP addresses ?

It is vlan 10. The router address for ISP1 in SiteA is 10.10.20.254 and the router address for ISP2 in SiteB is

10.10.20.253.

1) pick a new subnet for the client vlan and create the scope on your DHCP servers. If you use both DHCP servers then split the scope in half between DHCP servers

Is 10.20.20.x ok?

2) pick a new vlan number for use for the clients

Is vlan 20 ok?


3) assuming the router's LAN interfaces are in vlan 10 i will need 4 x /30 subnets for all four uplinks ie. 2 from 3750_3 and 2 for each IP Services stack to router connection. You can just give me a class C if you want and i will break it down

How about 192.168.10.x ? I know that currently the router lan interfaces are on 10.10.20.x subnet, but we can change it easily.

4) we only want the new client vlan on 3750_3. If 3750_3 is in VTP client mode then it will not work once we change to L3 uplinks so we need to change the VTP mode to VTP transparent on the 3750_3. Once you have done this we can then create the new client subnet and the L3 vlan interface for the client subnet + ip helper-addresses and this still won't affect your current setup.

Will make the change tomorrow.

As for downtime, we have removed the clients from the current network (to using mobile routers for internet), so this week and next week is completely open to tear down and redo the network.

If you can help guide us with the configs that'll be great. Please let me know.

I will send you a PM four our contact info when we start the implementation of your design.




Thanks for the quick response.

So i assume you want to go with the redesign then and not just to try and fix the current setup ?

I will start work on the configs. I will post changes to make on the stacks first.

One thing i need are the router configs just to double check there isn't anything funny going on.

I think i will go with the delay option for the default routes so when you implement we may need to adjust depending on what we see in the routing tables.

The vlan/IP subnets you posted are fine.

Jon

Nothing on the routers.

Just route out to internet, nat overload for user connections to internet, and internal IP address. No access lists, no static NATs, no VPN, nor anything else on them at this point.

Redesign would be best. Better to do it properly now, and get to see how it should be done.

Thanks

Are the routers running EIGRP ?

Jon

Sorry, forgot to ask. In order to readdress the router uplinks can you tell me on each IP Services stack which interface connects to the router ?

Jon

One final question.

The actual physical links between sites. I have been assuming these are your won fibers so we can just make then L3 routed links. Sometimes though if they are provider links they can be setup only as trunk links.

Do you know if that is the case. If it a trunk linkk only we can still make all this work it just changes the configuration slightly that's all.

I am most of the way through the configs so i will assume they can work as L3 links unless you tell me otherwise.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

The servers in SiteB (3750_2) are just replicated copies of the servers in SiteA (3750_1). However, there always needs to be an active connection between SiteA and SiteB. Only time that the UserSite(3750_3) needs to access servers in SiteB is when SiteA is completely down.

However, if the servers in SiteA (3750_1) can still be reached through via site SiteB (3750_2) when the direct connection between the UserSite(3750_3) and SiteA (3750_1) is down, that route should also be available (if possible).

The above is good news because it means we can route optimally for both internet and client to server traffic.  STP is blocking one of the links from 3750_3, the one to 3750_2.

A couple of questions from the configurations you posted -

1) on 3750_3 you said in a previous post the clients use the vlan 10 interface IP on 3750_3 as their default gateway. But there are no default routes in the routing table on 3750_3 so how does internet work at the moment ?

In fact there are no default routes on any of the stacks. Is internet not working at the moment because i can't see how it can be.

What is weird is that you have static default routes configured on each 3750 IP Services stack but they are not even showing up in the routing tables on that stack ?

2) the 192.168.50.32/29 and 192.168.50.40/29 networks. They don't seem to be doing anything ? I think they were meant to be for the uplinks from 3750_3 but they haven't been used for this.

3) the IP services stack connections to the routers. Do you know what IP subnet is used for this. It looks like it is vlan 10 again but can you confirm ie. what are the routers LAN interface IP addresses ?

So if you can answer the above we can probably start with reconfiguration. There is going to be quite a lot. You can however do some prepatory work without affecting anything -

1) pick a new subnet for the client vlan and create the scope on your DHCP servers. If you use both DHCP servers then split the scope in half between DHCP servers

2) pick a new vlan number for use for the clients

3) assuming the router's LAN interfaces are in vlan 10 i will need 4 x /30 subnets for all four uplinks ie. 2 from 3750_3 and 2 for each IP Services stack to router connection. You can just give me a class C if you want and i will break it down

4) we only want the new client vlan on 3750_3. If 3750_3 is in VTP client mode then it will not work once we change to L3 uplinks so we need to change the VTP mode to VTP transparent on the 3750_3. Once you have done this we can then create the new client subnet and the L3 vlan interface for the client subnet + ip helper-addresses and this still won't affect your current setup.

All of the above can be done with no downtime although if you change to VTP transparent do this out of hours just in case there is an impact. There shouldn't be but it is worth being safe.

Next are the changes that will require downtime. I appreciate it is three separate sites but because you are changing the uplinks it needs to be done all at once. Basically we need to -

1) on all clients release the current IP address. Then have them shutdown.

2) modify EIGRP config for new client vlan and remove all the other stuff as it is not needed

3) reconfigure all uplinks to be L3 on switch stacks and then check the routing tables to make sure there is a route for vlan 10.

3) allocate all ports for clients into the new client vlan

4) bring clients back up and they should then get new IPs from the DHCP server

5) remove any unnecessary configuration from 3750_3

6) reconfigure 3750_1 and 3750_2 connections to their routers

7) modify EIGRP configuration on both IP Services stacks

8) modify IP SLA configuration on 3750_1. Note there is no point in tracking on 3750_2 because only when ISP1 fails do you want to use ISP2 and if ISP2 fails while ISP1 is down there is nothing to fail back to.

9) modify EIGRP config on the routers

that's a brief outline of what needs doing so you can see it is a fair bit and will need careful planning. As far as the default route goes there are two options and i'm still thinking it all through -

1) have both routers generate a defaut route and apply a delay on the 3750_3 to 3750_2 uplink so 3750_1 is preferred. The advantage of this is that 3750_3 has both routes so if the uplink to 3750_1 it can immediately switch to 3750_2. The delay would also apply to vlan 10 as well so traffic would go to 3750_1 which is what you want.

2) have only 3750_1 generate a default route and only if ISP1 fails does ISP2 then generate a default route. The advantage of this is that you would not need to tweak the delay to get ti right but if ISP1 fails there will be a delay before 3750_2 realises the default route has gone and generates it's own.

So it's a tradeoff. I'll have another look at the configs etc. and decide which is best.

It is a fairly large redesign but at the moment the configurations are quite confusing because there is a lot of extra stuff which isn't doing anything as far as i can see. I am happy to provide configs and explanations as to what you should see and also happy to be around when you implement it ie. e-mail or if you can't e-mail (no internet access ) then on the phone. You would just need to let me know when you needed me and i'll make sure i am available (no charge obviously).

If you could answer the above questions and let me know how you want to proceed then we can take it from there.

Jon

Jon Marshall
Hall of Fame
Hall of Fame

One thing i forgot to say.

Because STP is blocking on the uplink to 3750_2 if you wanted you could avoid the entire redesign and simply do this -

1) have both switch stacks generate a default route. 3750_3 will only get one default route ie. the one from 3750_1 because the other link is blocking.

2) if ISP1 fails then 3750_3 would get the default route from 3750_2 but it would come via 3750_1 because of the blocked link. This means all internet traffic to ISP2 would go via 3750_1 but it would still work.

So you may decide that is enough for what you need. I am not trying to put you off the redesign i am just pointing out the above could be implemented with very few changes.

In fact i'm not sure why it isn't working now, other than the fact i can't see any default routes in any routing tables. Your initial problem was that it wan't failing back to ISP1 but as far as i can see it should have because it cannot get direct to 3750_2 anyway. So we may be redesigning to fix a problem that may simply be to do with IP SLA or route propogation. The redesign i have suggested is how i would do it but it may be that we are creating a lot of work when we don't need to.

So if you wanted to do this then it would be fine by me and we could look at making it work as optimally as we could but it would always have to go via 3750_1 because of STP.

We could still tidy up the configs if you wanted without having to redesign anything.

Let me know what you think.

Jon

Let's tidy up the configs.

Please see my reply to your questions above.

Review Cisco Networking products for a $25 gift card