BGP flap - OutQ

Unanswered Question
Jul 3rd, 2009
User Badges:
  • Gold, 750 points or more

Dear all


am dealing with a a BGP flap issue, and looked at pretty much all what I can think of.


outQ is not able to drop to zero, this is on a Route reflector which has 2 legs to a distribution switch. IGP used is IS-IS, platform is 7301.


Since it is a Reflector, it has many BGP sessions. all sessions are behaving as expected....except the one to the primary reflector.


I checked for L2 issues, MTU issues. I have no QOS, nor do I see need for it. since its a reflector no much traffic carried on Ge ports.


extended pings between loopbacks works fine, no drops. and load balancing is per destination.


I got fiber switched, connectors changed, disabled one the redundant legs to stop load balancing.


as you can see, I have pretty much been thru all what I can think off. so any suggestions are appreciated !


TIA


Sam

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Fri, 07/03/2009 - 11:07
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Sam,

BGP Route Reflectors with many BGP peers are requested to perform a lot of process switching.


The problem could be at the buffering stage on both sides on two levels:

interface

system buffers


There are some old guidelines about them:


increase hold-queue to 1000 both in and out on physical interfaces.


tune system buffers

there is an auto tune option in modern IOS images.


There is an algorithm that handles the hold queue called spd


This has to be tuned too with commands like:


spd headroom 750


ip spd mode aggressive

ip spd queue max-threshold 999

ip spd queue min-threshold 998


SPD provides preferential treatment to ip routing protocol messages, IS-IS and CDP.


These guidelines were provided by Cisco for a service provider customer.


Hope to help

Giuseppe


cisco_lad2004 Fri, 07/03/2009 - 11:34
User Badges:
  • Gold, 750 points or more

Thanks Giuseppe !


I have indeed increased Hold Q earlier to 4096 IN/OUT in vain.


Since other clients are fine, I am starting to suspect recursive routing as a cause. The session is going up and down even after I increased timers between both Reflectors to let is settle.


Sam

cisco_lad2004 Sat, 07/04/2009 - 00:04
User Badges:
  • Gold, 750 points or more

I used a temporary work around by peering between Reflectors physical interfaces instead of loopbacks. This works with no flaps, which suggest issue is not related to QOS, timers, or router resources but a possible routing / tag switching error.


RRF1-PE1-COR01==COR02-PE2-RRF2


Tracing LSP between loopbacks looks good,keeping un mind there is no tag switching between RRF and PE.

I can still ping between RRFs Loopbacks ( extended) and using max MTU.


Any thoughts would be great help !


TIA



Sam

cisco_lad2004 Wed, 07/08/2009 - 23:27
User Badges:
  • Gold, 750 points or more

I ran a debug on the suspected bugged 7301 and could see that it is only sending keepalives once for a given neighbor...but regular ones to the other peers.


In short only one update to establish session, then skip sending next 2 until session tears down


Any thoughts ?


Sam

Eyal Hezi Wed, 07/28/2010 - 04:18
User Badges:

Hi All,


Having the same problem.

Did anyone has manage to solve this issue?


10x

Eyal

Actions

This Discussion