Question On Using Stacked 3750's At Distribution With OSPF

Answered Question
Apr 17th, 2009
User Badges:

Hello and thanks for any comments you offer.


I'm testing stacked 3750s at the distribution layer of a typical 3 layer network. I'm trying to leverage the redundancy of cross-stack etherchannel while eliminating the need to rely on the recovery process of spanning tree to the (L2) access layer.


In my testing so far I have found that when I take off-line (reload) a member switch of the 2 member stack the recovery of connectivity is quick (about a second or so.) However when I take off-line the master switch, the recovery time is about 15 - 20 secs and my OSPF neighbour (Canadian, eh?) adjacencies reset.


Couple of questions:

1. Is anyone using this design in production anywhere?

2. If so have you seen any issues?

3. Any comments on the longer recovery time when I lose the stack master?


Thanks,

Simon

Correct Answer by andrew.butterworth about 8 years 1 month ago

This contradicts what we saw with our testing, however without knowing the full topology and what the various routers are its a bit difficult to comment.

Have you verified that the OSPF neighbors are NSF aware? Are you doing any summarisation further up the network that would hide the OSPF recovergence. I would probably perform the testing with a PC attached to the 3750 stack with multiple pings running to the various OSPF neighbors up the tree (do a traceroute and record the neighbors) and see if the behaviour shows any patterns?.


HTH


Andy

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4 (4 ratings)
Loading.
m-haddad Fri, 04/17/2009 - 13:48
User Badges:
  • Silver, 250 points or more

Hello Simon,


I have used 3750 a lot but not in the same design you are trying to accomplish. As for the recovery time it is normal because the OSPF has to re-establsih adjacency and populate the routing table. C3750 is not designed for NSF (Non Stop Forwarding) with stateful switchover (SSO). Higher series of switches can support NSF with SSO such as C6500.


In your situation what you can do is break the stack into two distribution switches and rely either on HSRP or routing protocol to converge in case of failure of any of the stacks.


Hope this helps,


Regards,



spreed Fri, 04/17/2009 - 14:13
User Badges:

Thanks Mohamad!


We are already using the 3750 switch in our campus design in the way you described (two separate distribution routers that are fully meshed with HSRP and OSPF.)


I am trying a variation of that with the stacked 3750/etherchannel combination to allow for spanning vlans across multiple access switches but still building a redundant loop-free topology.


Thanks for the comments on NSF and SSO. I hadn't given that any thought.


Simon

andrew.butterworth Fri, 04/17/2009 - 14:07
User Badges:
  • Gold, 750 points or more

We did some similar testing and like you saw similar delays due to OSPF reconvergence. The 3750 does support NSF, however it needs enabling, plus its neighbors needs to be NSF aware (just type 'nsf' under the OSPF process). You can verify if the neighbors are NSF aware with the 'show ip protocols' command.

The other issue we saw was due to the MAC addresses changing when the stack master changed, this can also be eliminated with the global command 'stack-mac persistent timer 0'. By default if the previous master hasn't returned then all the MAC addresses of the switch change after 4-minutes. Setting the persistent mac timer to 0 forces the stack to always use the original masters MAC addresses until either the configuration is changed or the whole stack is reloaded. Gratuitus ARPs are sent by the new master when the MAC changes but we did have various issues until we forced the MAC to 'persist'.


HTH


Andy

spreed Fri, 04/17/2009 - 14:19
User Badges:

Thanks Andy!


I'll test this out and then reply again with further comments/ratings.


Simon

spreed Tue, 04/21/2009 - 13:43
User Badges:

Hello Andy,

I did some further testing and found one of the problems I was hitting was bug cscsj05589 which causes the VRF config line to be removed from a VLAN interface. This issue was causing other symptoms that I was seeing. (Fixed in 12.2(44))


Thanks for the comments on NSF and MAC persistence. I read more about these features. However even when I enabled them on all the necessary devices I still end up seeing about a 20 second disruption where packets are not forwarded when I reload the stack master. It doesn't seem to work as expected where packets should continue to be forwarded while the routing table reconverges. Does anyone have any further comments on this??


Correct Answer
andrew.butterworth Wed, 04/22/2009 - 06:54
User Badges:
  • Gold, 750 points or more

This contradicts what we saw with our testing, however without knowing the full topology and what the various routers are its a bit difficult to comment.

Have you verified that the OSPF neighbors are NSF aware? Are you doing any summarisation further up the network that would hide the OSPF recovergence. I would probably perform the testing with a PC attached to the 3750 stack with multiple pings running to the various OSPF neighbors up the tree (do a traceroute and record the neighbors) and see if the behaviour shows any patterns?.


HTH


Andy

spreed Wed, 04/22/2009 - 13:08
User Badges:

Thanks for the further comments Andy. I am still looking at this issue.


The 3750 stack is connected to our two core 6513s. For my testing I did enable NSF in the routing process for each device (both cores and 3750 stack) but it made no difference. I did not enable it on the 3750 pair (not stacked) where I was testing to. The only summarization done is on the distribution pair (one end) and the 3750 stack (other end.)


It does seem that I'm missing something in the config so I'll keep looking it over. Again thanks for taking the time to make the post with the suggestions.


Simon

spreed Thu, 04/23/2009 - 13:18
User Badges:

Thanks Mohamad,


Your post was quite helpful. The page explained how to confirm that NSF was actually working by looking at one line in the output of the "show ip ospf neighbor detail" command. The OOB-Resync time value confirms that the NSF process resynched the table at the time stated.


I was able to get my failover time down to subsecond by following Andy's earlier response.


Thanks again for your help,

Simon

m-haddad Thu, 04/23/2009 - 13:20
User Badges:
  • Silver, 250 points or more

Hello,


You're most welcome. I am glad that everything is working as you expected.


Regards,




spreed Thu, 04/23/2009 - 13:12
User Badges:

Hello Andy,


Have done quite a bit more testing since last post. First, found that I still had HSRP configured on the user facing interface (left over from before I stacked the two together) and that as soon as I removed it the failover time shrunk to about 4 seconds. Second, when I implemented nsf config (entered in OSPF process of stack and neigbour core boxes) failover was further reduced. Third when I configured stack-mac persistence timer 0, failover was further reduced to sub-second.


Thank you for sharing your research. It worked out just as you said. Much appreciated,

Simon

andrew.butterworth Thu, 04/23/2009 - 15:54
User Badges:
  • Gold, 750 points or more

Glad you sorted it out. Sub-second was what we were seeing when we did similar testing.


Cheers


Andy

Actions

This Discussion