Solved: Re: BGP convergence

cisco_lad2004 · ‎08-14-2006

Hi all

I have a setup today where I rely on L2 for convergence(3s).

This set up needs to be phased out, and since I only have L3 to fine tune, I would like to ask if anyone has played with BGP fall-over and which kind of convergence do u get.

TIA

Sam

ilya.varlashkin · ‎08-14-2006

Here is sample scenario I've just tried in the lab - two routers connected via intermediate switch, configured fall-over on one of the routers and shut down its interface on the switch side. It took less than a second for first route to be changed, other were following immediately but I've truncated output.

CE8#sh ip bgp

BGP table version is 80, local router ID is 192.168.129.8

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

* i192.168.128.4/30 192.168.128.5 0 100 0 65002 ?

* i 172.16.1.4 0 100 0 65002 ?

*> 192.168.128.1 0 65002 ?

CE8#

*Aug 14 21:50:25.650: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down

*Aug 14 21:50:25.650: is_up: 0 state: 4 sub state: 1 line: 0 has_route: True

*Aug 14 21:50:25.650: BGP: 192.168.128.1 resetting - interface FastEthernet0/0 down

*Aug 14 21:50:25.650: RT: interface FastEthernet0/0 removed from routing table

*Aug 14 21:50:25.650: RT: del 192.168.128.0/30 via 0.0.0.0, connected metric [0/0]

*Aug 14 21:50:25.650: RT: delete subnet route to 192.168.128.0/30

*Aug 14 21:50:25.650: RT: NET-RED 192.168.128.0/30

*Aug 14 21:50:25.650: RT: Try lookup less specific 192.168.128.0/30, default 1

*Aug 14 21:50:25.650: RT: Failed found subnet on less specific

*Aug 14 21:50:25.654: RT: return default 0.0.0.0/0

*Aug 14 21:50:25.654: BGPNSF state: 192.168.128.1 went from nsf_not_active to nsf_not_active

*Aug 14 21:50:25.654: BGP: 192.168.128.1 went from Established to Idle

*Aug 14 21:50:25.654: %BGP-5-ADJCHANGE: neighbor 192.168.128.1 Down Interface flap

*Aug 14 21:50:25.654: BGP: 192.168.128.1 closing

*Aug 14 21:50:25.654: RT: 192.168.128.4/30 gateway changed from 192.168.128.1 to 192.168.128.5

*Aug 14 21:50:25.658: RT: NET-RED 192.168.128.4/30

So it's really immediate.

View solution in original post

globalnettech · ‎08-14-2006

Hello Sam,

convergence with the fall-over feature enabled is immediate, rather than relying on the default 180 seconds hold timer. The peering session is terminated right away upon detection of a missing host route.

Be aware though that you do need to confiure a host route for each peer you have that feature enabled for.

Have a look at this doc as well:

BGP Support for Fast Peering Session Deactivation

http://www.cisco.com/univercd/cc/td/doc/product/software/ios120/120newft/120limit/120s/120s29/cs_bsfda.htm

Regards,

GNT

cisco_lad2004 · ‎08-14-2006

Dear GNT

Thanks for the prompt reply.

Sadly, I know of the 3X60s wait. I was really after what what convergence time can we realsitically achieve.

are we talking ms or seconds :-)

Same applies for fast-external-fallover. I am still searching cisco site to find some indication. I coudl test it of course, but my boss did not want to invest on a test lab...now he can wait for my answer.

Thanks a lot

Sam

ilya.varlashkin · ‎08-14-2006

When it works, it's supposedly immediate, but with few images I tried enabling this command just killed the router, so be careful enabling it on production network.

cisco_lad2004 · ‎08-14-2006

Good stuff, and here was I thinking it would actually relief the router by reducing load due to BGP scanner process. I guess thi sis another argument to fight for a test lab.

Many thanks

Sam

mheusinger · ‎08-14-2006

Hi,

may I ask about which scenario we are talking here? Internet access? Private network? MPLS VPN? Could an IGP be used instead of BGP? Also how many prefixes are we talking about? Which hardware platforms are involved?

Generally BGP was not meant to perform like a Ferrari but rather like a truck for scalability and stability reasons. ISIS and OSPF however can be optimized for speed (like 1 sec).

I find it difficult to give a general answer about convergence without knowing the setup.

Regards, Martin

cisco_lad2004 · ‎08-14-2006

Hi Martin

The setup is as follows, metro ring running RSTP. each switch act as access for Vlans which are terminated on 7206VXR MPLS/VPN sub-interfaces.

Fiber breaks are taken care of by RSTP since it is a ring topology. However if I move away from RSTP to a pure L3 I can only rely on teh truck...huh, sorry BGP.

The comparison actually is a good one. I just wished Cisco were more specific about the gains u get when u fine tune. Granted it woudl depends on platforms and number of sessions, but an idea would have been a good guideline.

Many thanks

Sam

mheusinger · ‎08-14-2006

Hi Sam,

you want to replace RSTP by a L3 solution, ok, why not. But why do you come to the conclusion only BGP can deliver this in your setup? Either it is solely Layer2 to your customer/access, then BGP wouldn?t help as you need a peer=router. If there is another router, then conectivity "testing" is done with BGP via the TCP session (and interface up/down). But if there is a router, then also OSPF should be possible. Even if you have OSPF in the VRF it should be faster than BGP, because OSPF Hellos will be "testing" the connectivity (can be tuned to 1 sec hold time). There are no limitations regarding OSPF in a VRF besides CPU and memory on the participating routers. The same applies to BGP. I am not even sure, which protocol would be more demanding, IF you tune both to the same convergence time.

So where in your setup is the hidden killer argument for BGP?? All I can see so far are arguments against BGP - given the fast convergence time requirement.

Another question: which applications are requesting 3 secs or less convergence time? Design should be top down from an OSI model perspective, not the other way round. Your apps should tell you what is NOT possible and then you take whatever is left and the cheapest. Don?t forget your crystal ball for future requirements though ;-)

My 2 cents.

Regards, Martin

cisco_lad2004 · ‎08-14-2006

Hi Martin

I run Inter-AS VPN, ASBR's are connected /30 between interfaces dot1Q encapsulated. The Vlans are purely switched over a metro ring.

I am happy with RSTP and its less then 3s which keeps my voice customers happy.

So, I cannot fine tune IGPs here at all. In fact, if link between ASBR is down, I will have to wait the 180s BGP offers. However within my network, a break will be almost invisible, and should guarantee my SLA.

My RSTP/MSTP has a limitation with number of switches. f I want to scale, I need to ditch a switch based metro Ring and have half rings terminated on routers. Them breaks in my network will also need to wait 180s.

Man I broke my crystal ball some time ago. That is why I am in this predicament :-(

Cheers

Sam

mheusinger · ‎08-14-2006

Hi Sam,

ok I see. Well then have a look at this:

Adjusting BGP timers

http://www.cisco.com/en/US/products/ps6350/products_configuration_guide_chapter09186a00804435fc.html#wp1002274

The 60 sec for keepalive are just the default. Lower it to 5 secs - 15 secs hold time - and you get faster convergence.

Also consider to tune - carefully! - BGP scan timer and use BGP fast-external-fallover.

BGP Support for Fast Peering Session Deactivation

http://www.cisco.com/en/US/products/ps6350/products_configuration_guide_chapter09186a008045561f.html

And also look into

BGP Support for Next-Hop Address Tracking

http://www.cisco.com/en/US/products/ps6350/products_configuration_guide_chapter09186a008045561e.html

In addition BGP is NSF aware, which might be remotely interesting for you.

Hope this helps! Please rate all posts.

Regards, Martin

cisco_lad2004 · ‎08-14-2006

Hi Martin

Many thanks for the feedback !

Have u actually tested and timed fall-over feature ? what kind of convergence are we talking about. This is what I cannot find anywhere on Cisco's website and unfortunately cannot test either.

If I could have it my way, I would just keep on running RSTP with its less than 3 sec guaranteed result instead of melting a router CPU with excessive BGP scanners.

Regards

Sam

ilya.varlashkin · ‎08-14-2006

Here is sample scenario I've just tried in the lab - two routers connected via intermediate switch, configured fall-over on one of the routers and shut down its interface on the switch side. It took less than a second for first route to be changed, other were following immediately but I've truncated output.

CE8#sh ip bgp

BGP table version is 80, local router ID is 192.168.129.8

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

* i192.168.128.4/30 192.168.128.5 0 100 0 65002 ?

* i 172.16.1.4 0 100 0 65002 ?

*> 192.168.128.1 0 65002 ?

CE8#

*Aug 14 21:50:25.650: %LINEPROTO-5-UPDOWN: Line protocol on Interface FastEthernet0/0, changed state to down

*Aug 14 21:50:25.650: is_up: 0 state: 4 sub state: 1 line: 0 has_route: True

*Aug 14 21:50:25.650: BGP: 192.168.128.1 resetting - interface FastEthernet0/0 down

*Aug 14 21:50:25.650: RT: interface FastEthernet0/0 removed from routing table

*Aug 14 21:50:25.650: RT: del 192.168.128.0/30 via 0.0.0.0, connected metric [0/0]

*Aug 14 21:50:25.650: RT: delete subnet route to 192.168.128.0/30

*Aug 14 21:50:25.650: RT: NET-RED 192.168.128.0/30

*Aug 14 21:50:25.650: RT: Try lookup less specific 192.168.128.0/30, default 1

*Aug 14 21:50:25.650: RT: Failed found subnet on less specific

*Aug 14 21:50:25.654: RT: return default 0.0.0.0/0

*Aug 14 21:50:25.654: BGPNSF state: 192.168.128.1 went from nsf_not_active to nsf_not_active

*Aug 14 21:50:25.654: BGP: 192.168.128.1 went from Established to Idle

*Aug 14 21:50:25.654: %BGP-5-ADJCHANGE: neighbor 192.168.128.1 Down Interface flap

*Aug 14 21:50:25.654: BGP: 192.168.128.1 closing

*Aug 14 21:50:25.654: RT: 192.168.128.4/30 gateway changed from 192.168.128.1 to 192.168.128.5

*Aug 14 21:50:25.658: RT: NET-RED 192.168.128.4/30

So it's really immediate.

cisco_lad2004 · ‎08-15-2006

Hi Ilya

This answers my question fully. and many thanks for taking time to test this.

Best regards

Sam

jackyoung · ‎08-15-2006

HI Ilya,

Could you mind to share the config. that you tested ? Is it only add the "fall-over" under the neighbor command ? Shall it need to sync. w/ two peers to enable the fall-over at the same time ? i.e. can one side enable and remote side not enable it ?

Seems it cannot be find in RFC. Is it a Cisco specific feature or standard BGP feature ? i.e. is it supported by non-Cisco router ? What I think if my peer is not Cisco and not support this fall-over feature. Will there be any issue ?

Sorry for too many questions and I do not have the lab to test it myself, so I want to learn the experience from you and other Netpro. Many Thx.

ilya.varlashkin · ‎08-15-2006

Hi Jack,

here is all what I had for this test:

router bgp 65001

neighbor 192.168.128.1 remote-as 65002

neighbor 192.168.128.1 fall-over

neighbor 172.16.1.4 remote-as 65001

neighbor 172.16.1.4 fall-over

Only this side has been configured with 'fall-over'. Result is that only this side will drop connection and therefore flush all associated routes, however remote side still relies on Keepalive. If link happens to come up before other side times out, then two systems will figure out that previous connection was in dormant state on remote side (TCP port on one of the sides will be invalid) and per RFC4271 that connection will be dropped and new one established.

Fall-over is for the moment Cisco-specific feature, but it's compliant with RFC4271 in way that BGP session is dropped as soon as TCP connection was lost, and fall-over feature forces TCP connection to be dropped as soon as destination of that connection disappeared from the routing table. Also, fall-over is system-local feature, i.e. support for it is not communicated between peers (a bit similar the way 'weight' attribute is Cisco specific).