cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1913
Views
11
Helpful
12
Replies

Question On Using Stacked 3750's At Distribution With OSPF

spreed
Level 4
Level 4

Hello and thanks for any comments you offer.

I'm testing stacked 3750s at the distribution layer of a typical 3 layer network. I'm trying to leverage the redundancy of cross-stack etherchannel while eliminating the need to rely on the recovery process of spanning tree to the (L2) access layer.

In my testing so far I have found that when I take off-line (reload) a member switch of the 2 member stack the recovery of connectivity is quick (about a second or so.) However when I take off-line the master switch, the recovery time is about 15 - 20 secs and my OSPF neighbour (Canadian, eh?) adjacencies reset.

Couple of questions:

1. Is anyone using this design in production anywhere?

2. If so have you seen any issues?

3. Any comments on the longer recovery time when I lose the stack master?

Thanks,

Simon

1 Accepted Solution

Accepted Solutions

This contradicts what we saw with our testing, however without knowing the full topology and what the various routers are its a bit difficult to comment.

Have you verified that the OSPF neighbors are NSF aware? Are you doing any summarisation further up the network that would hide the OSPF recovergence. I would probably perform the testing with a PC attached to the 3750 stack with multiple pings running to the various OSPF neighbors up the tree (do a traceroute and record the neighbors) and see if the behaviour shows any patterns?.

HTH

Andy

View solution in original post

12 Replies 12

m-haddad
Level 5
Level 5

Hello Simon,

I have used 3750 a lot but not in the same design you are trying to accomplish. As for the recovery time it is normal because the OSPF has to re-establsih adjacency and populate the routing table. C3750 is not designed for NSF (Non Stop Forwarding) with stateful switchover (SSO). Higher series of switches can support NSF with SSO such as C6500.

In your situation what you can do is break the stack into two distribution switches and rely either on HSRP or routing protocol to converge in case of failure of any of the stacks.

Hope this helps,

Regards,

Thanks Mohamad!

We are already using the 3750 switch in our campus design in the way you described (two separate distribution routers that are fully meshed with HSRP and OSPF.)

I am trying a variation of that with the stacked 3750/etherchannel combination to allow for spanning vlans across multiple access switches but still building a redundant loop-free topology.

Thanks for the comments on NSF and SSO. I hadn't given that any thought.

Simon

We did some similar testing and like you saw similar delays due to OSPF reconvergence. The 3750 does support NSF, however it needs enabling, plus its neighbors needs to be NSF aware (just type 'nsf' under the OSPF process). You can verify if the neighbors are NSF aware with the 'show ip protocols' command.

The other issue we saw was due to the MAC addresses changing when the stack master changed, this can also be eliminated with the global command 'stack-mac persistent timer 0'. By default if the previous master hasn't returned then all the MAC addresses of the switch change after 4-minutes. Setting the persistent mac timer to 0 forces the stack to always use the original masters MAC addresses until either the configuration is changed or the whole stack is reloaded. Gratuitus ARPs are sent by the new master when the MAC changes but we did have various issues until we forced the MAC to 'persist'.

HTH

Andy

Thanks Andy!

I'll test this out and then reply again with further comments/ratings.

Simon

Hello Andy,

I did some further testing and found one of the problems I was hitting was bug cscsj05589 which causes the VRF config line to be removed from a VLAN interface. This issue was causing other symptoms that I was seeing. (Fixed in 12.2(44))

Thanks for the comments on NSF and MAC persistence. I read more about these features. However even when I enabled them on all the necessary devices I still end up seeing about a 20 second disruption where packets are not forwarded when I reload the stack master. It doesn't seem to work as expected where packets should continue to be forwarded while the routing table reconverges. Does anyone have any further comments on this??

This contradicts what we saw with our testing, however without knowing the full topology and what the various routers are its a bit difficult to comment.

Have you verified that the OSPF neighbors are NSF aware? Are you doing any summarisation further up the network that would hide the OSPF recovergence. I would probably perform the testing with a PC attached to the 3750 stack with multiple pings running to the various OSPF neighbors up the tree (do a traceroute and record the neighbors) and see if the behaviour shows any patterns?.

HTH

Andy

Thanks for the further comments Andy. I am still looking at this issue.

The 3750 stack is connected to our two core 6513s. For my testing I did enable NSF in the routing process for each device (both cores and 3750 stack) but it made no difference. I did not enable it on the 3750 pair (not stacked) where I was testing to. The only summarization done is on the distribution pair (one end) and the 3750 stack (other end.)

It does seem that I'm missing something in the config so I'll keep looking it over. Again thanks for taking the time to make the post with the suggestions.

Simon

Hello,

I have found the below link intersting because it shows the requirements for OSPF with NSF:

http://www.cisco.com/en/US/docs/ios/12_2t/12_2t15/feature/guide/ftosnsfa.html

Hope this helps,

Thanks Mohamad,

Your post was quite helpful. The page explained how to confirm that NSF was actually working by looking at one line in the output of the "show ip ospf neighbor detail" command. The OOB-Resync time value confirms that the NSF process resynched the table at the time stated.

I was able to get my failover time down to subsecond by following Andy's earlier response.

Thanks again for your help,

Simon

Hello,

You're most welcome. I am glad that everything is working as you expected.

Regards,

Hello Andy,

Have done quite a bit more testing since last post. First, found that I still had HSRP configured on the user facing interface (left over from before I stacked the two together) and that as soon as I removed it the failover time shrunk to about 4 seconds. Second, when I implemented nsf config (entered in OSPF process of stack and neigbour core boxes) failover was further reduced. Third when I configured stack-mac persistence timer 0, failover was further reduced to sub-second.

Thank you for sharing your research. It worked out just as you said. Much appreciated,

Simon

Glad you sorted it out. Sub-second was what we were seeing when we did similar testing.

Cheers

Andy

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: