cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3001
Views
45
Helpful
11
Replies

neighbor send-label - a possible bug in 12.4(24)T4 and newer

Peter Paluch
Cisco Employee
Cisco Employee

Dear friends,

I have stumbled across a different behavior of the neighbor send-label in BGP in IOS versions 12.4(24)T4 up to 12.4(24)T6 inclusive, and I wanted to ascertain whether it is a bug or just a new behavior I am not yet aware of.

BGP1.png

Consider the following scenario: Router X, Y and Z are peered in BGP according to the exhibit. Router X is in AS 2, routers Y and Z are in AS 1. X/Y are peered using their physical interface addresses, routers Y/Z are peered using their loopback addresses. Each peering is duly configured with neighbor send-label.

The BGP configuration on router Y is as follows:

Y# show run | sec router bgp

router bgp 1

bgp log-neighbor-changes

neighbor 10.1.255.1 remote-as 1

neighbor 10.1.255.1 update-source Loopback0

neighbor 192.168.1.2 remote-as 2

!

address-family ipv4

  redistribute ospf 1

  neighbor 10.1.255.1 activate

  neighbor 10.1.255.1 send-label

  neighbor 192.168.1.2 activate

  neighbor 192.168.1.2 send-label

  no auto-summary

  no synchronization

exit-address-family

Router Y is receiving a set of routes from X, in particular:

Y# show ip bgp regexp _2

BGP table version is 22, local router ID is 10.1.255.5

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

              r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path

*> 10.2.12.0/24     192.168.1.2              4             0 2 ?

*> 10.2.23.0/24     192.168.1.2              3             0 2 ?

*> 10.2.34.0/24     192.168.1.2              2             0 2 ?

*> 10.2.45.0/24     192.168.1.2              0             0 2 ?

*> 10.2.255.1/32    192.168.1.2              5             0 2 ?

*> 10.2.255.2/32    192.168.1.2              4             0 2 ?

*> 10.2.255.3/32    192.168.1.2              3             0 2 ?

*> 10.2.255.4/32    192.168.1.2              2             0 2 ?

*> 10.2.255.5/32    192.168.1.2              0             0 2 ?

The show ip bgp label on router Y, however, produces rather interesting results:

Y# show ip bgp labels

   Network          Next Hop      In label/Out label

   10.2.12.0/24     192.168.1.2     nolabel/16

   10.2.23.0/24     192.168.1.2     nolabel/17

   10.2.34.0/24     192.168.1.2     nolabel/18

   10.2.45.0/24     192.168.1.2     nolabel/imp-null

   10.2.255.1/32    192.168.1.2     nolabel/19

   10.2.255.2/32    192.168.1.2     nolabel/20

   10.2.255.3/32    192.168.1.2     nolabel/21

   10.2.255.4/32    192.168.1.2     nolabel/22

   10.2.255.5/32    192.168.1.2     nolabel/imp-null

Note that while the routes are being received with MPLS labels, the router Y does not seem to allocate any local label bindings to these labels although all these routes are being further advertised to router Z via iBGP.

On router Z, the results are also confusing. First of all, networks received from router Y are still learned with the original next-hop set to 192.168.1.2 instead of 10.1.255.5 (using send-label on router Y should imply next-hop-self):

Z# show ip route bgp

     10.0.0.0/8 is variably subnetted, 18 subnets, 2 masks

B       10.2.12.0/24 [200/4] via 192.168.1.2, 00:26:28

B       10.2.23.0/24 [200/3] via 192.168.1.2, 00:26:28

B       10.2.45.0/24 [200/0] via 192.168.1.2, 00:26:28

B       10.2.34.0/24 [200/2] via 192.168.1.2, 00:26:28

B       10.2.255.5/32 [200/0] via 192.168.1.2, 00:26:28

B       10.2.255.4/32 [200/2] via 192.168.1.2, 00:26:28

B       10.2.255.3/32 [200/3] via 192.168.1.2, 00:26:28

B       10.2.255.2/32 [200/4] via 192.168.1.2, 00:26:28

B       10.2.255.1/32 [200/5] via 192.168.1.2, 00:26:28

Verifying the show ip bgp label on router Z shows another interesting behavior: although Y has claimed it has not allocated any labels itself, it has in fact advertised the eBGP routes to Z with the original labels as allocated by X (hence highlighted in the previous and current output):

Z# show ip bgp labels

   Network          Next Hop      In label/Out label

   10.2.12.0/24     192.168.1.2     nolabel/16

   10.2.23.0/24     192.168.1.2     nolabel/17

   10.2.34.0/24     192.168.1.2     nolabel/18

   10.2.45.0/24     192.168.1.2     nolabel/imp-null

   10.2.255.1/32    192.168.1.2     nolabel/19

   10.2.255.2/32    192.168.1.2     nolabel/20

   10.2.255.3/32    192.168.1.2     nolabel/21

   10.2.255.4/32    192.168.1.2     nolabel/22

   10.2.255.5/32    192.168.1.2     nolabel/imp-null

An ironic fact is that on router Y, the labels 16-22 are already allocated for different internal networks by LDP. If router Z uses the labels as advertised by router Y, this will cause the packets to be heavily misrouted from router Y to completely different destinations:

Y# show mpls forwarding-table

Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop

Label  Label or VC   or Tunnel Id      Switched      interface

16     Pop Label     192.168.1.2/32    0             Fa0/0      192.168.1.2

17     Pop Label     10.1.255.4/32     0             Fa0/1      10.1.45.4

18     20            10.1.255.3/32     0             Fa0/1      10.1.45.4

19     19            10.1.255.2/32     0             Fa0/1      10.1.45.4

20     18            10.1.255.1/32     0             Fa0/1      10.1.45.4

21     16            10.1.12.0/24      0             Fa0/1      10.1.45.4

22     17            10.1.23.0/24      0             Fa0/1      10.1.45.4

So, there are two suspicious facts about the behavior of router Y:

  1. It does not modify the next-hop attribute when advertising the eBGP routes along with MPLS labels via iBGP to another internal BGP neighbor.
  2. It does not allocate its own local MPLS label bindings, rather it simply re-advertises the labels as allocated by router Z, resulting in label value conflicts and misrepresentations

An interesting fact is that after adding the command neighbor 10.1.255.1 next-hop-self to the router's Y configuration, the behavior becomes correct again:

Y(config)# router bgp 1

Y(config-router)# address-family ipv4

Y(config-router-af)# neighbor 10.1.255.1 next-hop-self

Y(config-router-af)# do show ip bgp label

   Network          Next Hop      In label/Out label

   10.2.12.0/24     192.168.1.2     24/16

   10.2.23.0/24     192.168.1.2     25/17

   10.2.34.0/24     192.168.1.2     31/18

   10.2.45.0/24     192.168.1.2     27/imp-null

   10.2.255.1/32    192.168.1.2     26/19

   10.2.255.2/32    192.168.1.2     28/20

   10.2.255.3/32    192.168.1.2     29/21

   10.2.255.4/32    192.168.1.2     30/22

   10.2.255.5/32    192.168.1.2     32/imp-null

On Z:

Z# show ip bgp labels

   Network          Next Hop      In label/Out label

   10.2.12.0/24     10.1.255.5      nolabel/24

   10.2.23.0/24     10.1.255.5      nolabel/25

   10.2.34.0/24     10.1.255.5      nolabel/31

   10.2.45.0/24     10.1.255.5      nolabel/27

   10.2.255.1/32    10.1.255.5      nolabel/26

   10.2.255.2/32    10.1.255.5      nolabel/28

   10.2.255.3/32    10.1.255.5      nolabel/29

   10.2.255.4/32    10.1.255.5      nolabel/30

   10.2.255.5/32    10.1.255.5      nolabel/32

Router Y is a 2811 currently running 2800 Software (C2800NM-ADVIPSERVICESK9-M), Version 12.4(24)T6. I have originally came across this behavior with 12.4(24)T4. I have confirmed that this behavior is not present with ADVENTERPRISEK9-M 12.4(22)T, so if this is a bug, it must have been "added" in some intermediate versions.

I currently do not have any option of testing newer IOSes from the 15.x series, as the router does not have the inordinate 512MB of RAM necessary for those IOS versions so I apologize for not testing this behavior on the most recent releases.

Did anyone experience similar behavior? Is this really a bug? Will this be corrected in 12.4T train yet? Thank you for all suggestions!

Best regards,

Peter

2 Accepted Solutions

Accepted Solutions

Hello Peter,

thanks for your kind remarks.

BGP should be used for inter AS scenarios or for scalable Carrier supporting Carrier scenarios.

I explored the first in more depth for studies and as a possible migration solution for merging two networks.

RFC 3107 is the first about labeled BGP and explains that:

the label is integral part of a new type of NLRI in MP BGP with SAFI=4

multiple labels can be carried each taking a 3 octects field in the labeled NLRI

As I have written in my previous post the job for BGP is to join LSP segments that are created in each AS, this may require the use of more labels ( more depth in the label stack) so that PE loopbacks of provider A are seen in provider B network via ASBR of provider B. So all LSPs with destination PE nodes of provider A are pushed into the LSP with destination the ASBR of provider B ( from this the increase in the label stack depth) that can be built by LDP or RSVP TE just to say.

ASBR nodes are required to perform non trivial label swap operations that can change also the label stack depth. They may need to change two labels at once for example.

From the fact that the label is integral part of the NLRI it comes that it can be modified only when the BGP next-hop attribute is changed. This is the way the implementation has been designed, because it is what is needed.

As you have noted each node has its own label space and propagating the RX label choices to RZ is not a good job indeed as RY label choices are clearly different.

I agree with Riccardo the behaviour is now correct in IOS.

Proposal:

A warning message could be added when configuring neigh send-label to remind of the need of next-hop-self as when we put an interface under a VRF we are reminded that the IP address will be removed.

Hope to help

Giuseppe

View solution in original post

Hi Peter,

I just got the confirmation that the behavior you see now is the correct one and of course RFC3107 confirms it.

The ratio behind it is that when a LSR assigns label it starts 'attracting' traffic towards the prefixes it assigned the labels for as it is advertising that it is in the path. That is the reason why you do not configure next-hop-self on RRs otherwise they will attract all the traffic in the network (bringing it to its knees) as they should not be in the traffic path.

So the implicit next-hop-self behavior you previously saw is indeed buggy.

By the way also in IOS-XR the default behaviour has now changed and we do need to expressely configure next-hop-self if we want the LSR to assign local labels to prefixes.

The internal bug which introduced it is "CSCtk53821 BGP IAS functionality now requires explicit next-hop-self config"

regards,

Riccardo

View solution in original post

11 Replies 11

rsimoni
Cisco Employee
Cisco Employee

Hi Peter,

What a fantastic problem description you got us!! 

I wish I got similar ones when I was in the TAC!!!

Anyway I have the impression you got one of those 'gray areas' for which a given behavior is consistent (and apparently correct) across various releases until it turns out that it is not expected. I think this is your case also.

From my research I found out that:

1.  'using send-label on router Y should imply next-hop-self' (hence on iBGP sessions) it was not the wanted behavior for IOS even though it is the expected behavior on IOS-XR (I could not confirm this myself but I just found this statement on an email exchange between BGP developers).

2. Apparently IOS for a prolonged time on different IOS releases had this behavior, even though it was not documented anywhere (By the way did you find it documented somewhere?).

3. Various internal bugs addressed the issue from different perspectives; all of them were finally duplicated into an external one whose release notes are quite incomplete and misleading (I gotta admit that...) as they only mention CSC scenarios whereas the issue affects various BGP implementations.

Among the internal bugs the following have in their titles already part of the solution:

CSCsi18597    IPv4+labels: router always does next-hop-self when send-label is enabled

CSCsq49865    missing labels in mpls forwarding table

CSCsu33177    TEA bgp next hop will be broken after neighor send-label config

I have to mention them since, as I wrote, the external one which fixed the issue is quite misleading.

Here it is anyway:

CSCek55668    bgp next hop will be broken after neighor send-label config

In conclusion I think that 12.4(24)T4 and 12.4(24)T6 simply have the new and correct behavior which is not to have next-hop-self implicitely enabled on iBGP sessions. If you want it you have to expressely configure it (as it is apparently happening in your case).

In previous releases (lots of train affected) the next-hop-self was somehow implicit as you noticed. From what I see in the bug notes 15.0 and 15.1 in their latest rebuilds have the 'new' and correct behavior.

Let's see if some other BGP expert has anything to add to this or let me know if you have comments on this.

Riccardo


Hello Riccardo,

Thank you for your informative reply, I appreciate that immensely!

Whether send-label should imply next-hop-self - I do not remember seeing it stipulated in any official Cisco documentation but the books about MPLS I've read take it for granted. In any case, the behavior as seen in older IOSes (send-label implying next-hop-self) is generally wished for: it makes sure that the particular LSP towards an appropriate BGP ASBR is chosen and prevents from possible premature PHP-ing the topmost transport label. Things can work both with and without the next-hop-self-implied behavior, although implying it increases the chances of the MPLS labeling work properly. Whether this or that way, I wish that it was clearly described in the documentation that the behavior is being changed so that all IOSes are going to behave identically.

What is more grave, however, is the part with the label mappings. In my example, router Y did not create any local label bindings to received eBGP routes. When Y subsequently advertised the networks to router Z, it merely reused, i.e. copied, the label values as received from router Z, in effect confusing the outgoing tag values with incoming tag values. This is an outright incorrect behavior: the same incoming label values on router Y's LFIB already correspond to different destinations, and result in traffic being misrouted and blackholed. Why configuring the next-hop-self on router Y corrected these label bindings is beyond my comprehension - a particular modification of a next-hop attribute should have no influence on local assignment of labels!

Sadly, I do not have a TAC contract so I can not submit this as a bug to investigate.

Thank you once more, Riccardo, and to anyone willing to share his/her views on this issue!

Best regards,

Peter

Hi Peter,

on Monday I will ask a BGP guru to have a look at this.

Riccardo

Riccardo,

Thank you so much! I will be eagerly watching this thread for any new information. Thanks again, your help is very, very much appreciated!

Best regards,

Peter

Hello Peter,

>>

Why configuring the

next-hop-self

on router Y corrected these label bindings is beyond my comprehension - a particular modification of a next-hop attribute should have no influence on local assignment of labels!

because it matches with an MPLS LSP segment that starts on the ASBR router RY in your case.

with labeled BGP you cannot cover the swap labels operation. This is triggered/emulated by the change of the next-hop

I have given a look at my tests on BGP with labels and I was using next-hop self towards iBGP neighbors my devices were C7200 and C7500

it is interesting to see that some OS corrections may break our habits as explained by Simone.

Hope to help

Giuseppe

Hello Giuseppe,

Thank you very much for your answer. I am not sure I understand it correctly - please let me reexplain my major point and let me ask you for your kind advice.

Issue 1:

All routers are configured with send-label, neither of them is configured with next-hop-self. Router Y receives labeled BGP routes from router X and the show ip bgp labels displays the following table:

Y# show ip bgp labels

   Network          Next Hop      In label/Out label

   10.2.12.0/24     192.168.1.2     nolabel/16

   10.2.23.0/24     192.168.1.2     nolabel/17

   10.2.34.0/24     192.168.1.2     nolabel/18

   10.2.45.0/24     192.168.1.2     nolabel/imp-null

   10.2.255.1/32    192.168.1.2     nolabel/19

   10.2.255.2/32    192.168.1.2     nolabel/20

   10.2.255.3/32    192.168.1.2     nolabel/21

   10.2.255.4/32    192.168.1.2     nolabel/22

   10.2.255.5/32    192.168.1.2     nolabel/imp-null

Note that while router Y knows remote bindings for these networks (the "Out label" column), it has not created any local label bindings to these networks (the "In label" column says nolabel to all networks). I can assume that this is done to prevent assigning local labels to BGP routes that may eventually be routed through a different ASBR and possibly misunderstood en route. In other words, the local label binding has a local significance only. If there is no guarantee the packets will go through Y (without the next-hop-self), local label bindings on Y should not be created nor advertised. Am I correct in this line of reasoning?

Issue 2:

With the same configuration, router Y has advertised the BGP routes to router Z, however, it has retained the same label bindings it has learned itself - i.e. Y has not created any local bindings itself, it just "forgot" to remove the label bindings when advertising the routes to router Z:

Z# show ip bgp labels

   Network          Next Hop      In label/Out label

   10.2.12.0/24     192.168.1.2     nolabel/16

   10.2.23.0/24     192.168.1.2     nolabel/17

   10.2.34.0/24     192.168.1.2     nolabel/18

   10.2.45.0/24     192.168.1.2     nolabel/imp-null

   10.2.255.1/32    192.168.1.2     nolabel/19

   10.2.255.2/32    192.168.1.2     nolabel/20

   10.2.255.3/32    192.168.1.2     nolabel/21

   10.2.255.4/32    192.168.1.2     nolabel/22

   10.2.255.5/32    192.168.1.2     nolabel/imp-null

Note that the outgoing labels on Z are exactly the same as with router Y. This is in my opinion a bug. Take, for example, the route towards 10.2.255.2. The bottom label will be 20, the upper label will be a label towards 192.168.1.2. In my particular topology, the PHP will pop this transport label correctly before the router Y, and Y will receive a packet labeled with label 20. However, on Y, the 20 is not a mapping assigned to the 10.255.255.2, as BGP has not created any local bindings itself, and instead, the label 20 corresponds to a totally different network somewhere inside the cloud between routers Y and Z, as evidenced by the following output on Y:

Y# show mpls forwarding-table

Local  Outgoing      Prefix            Bytes Label   Outgoing   Next Hop

Label  Label or VC   or Tunnel Id      Switched      interface

16     Pop Label     192.168.1.2/32    0             Fa0/0      192.168.1.2

17     Pop Label     10.1.255.4/32     0             Fa0/1      10.1.45.4

18     20            10.1.255.3/32     0             Fa0/1      10.1.45.4

19     19            10.1.255.2/32     0             Fa0/1      10.1.45.4

20     18            10.1.255.1/32     0             Fa0/1      10.1.45.4

21     16            10.1.12.0/24      0             Fa0/1      10.1.45.4

22     17            10.1.23.0/24      0             Fa0/1      10.1.45.4

So the mere fact that the BGP on Y did not create local bindings is kind of understandable, however, the fact that it retained the remote label bindings as learned from X and advertised them without change to Z is, in my opinion, a grave bug. What is your opinion on this?

Thank you very much!

Best regards,

Peter

Hello Peter,

thanks for your kind remarks.

BGP should be used for inter AS scenarios or for scalable Carrier supporting Carrier scenarios.

I explored the first in more depth for studies and as a possible migration solution for merging two networks.

RFC 3107 is the first about labeled BGP and explains that:

the label is integral part of a new type of NLRI in MP BGP with SAFI=4

multiple labels can be carried each taking a 3 octects field in the labeled NLRI

As I have written in my previous post the job for BGP is to join LSP segments that are created in each AS, this may require the use of more labels ( more depth in the label stack) so that PE loopbacks of provider A are seen in provider B network via ASBR of provider B. So all LSPs with destination PE nodes of provider A are pushed into the LSP with destination the ASBR of provider B ( from this the increase in the label stack depth) that can be built by LDP or RSVP TE just to say.

ASBR nodes are required to perform non trivial label swap operations that can change also the label stack depth. They may need to change two labels at once for example.

From the fact that the label is integral part of the NLRI it comes that it can be modified only when the BGP next-hop attribute is changed. This is the way the implementation has been designed, because it is what is needed.

As you have noted each node has its own label space and propagating the RX label choices to RZ is not a good job indeed as RY label choices are clearly different.

I agree with Riccardo the behaviour is now correct in IOS.

Proposal:

A warning message could be added when configuring neigh send-label to remind of the need of next-hop-self as when we put an interface under a VRF we are reminded that the IP address will be removed.

Hope to help

Giuseppe

Hello Giuseppe,

I apologize for replying lately. I have read the RFC 3107 and found some indications that corroborate your point of view. Namely, the Section 3 mandates:

   The label(s) specified for a particular route (and associated with
   its address prefix) must be assigned by the LSR which is identified
   by the value of the Next Hop attribute of the route.

   When a BGP speaker redistributes a route, the label(s) assigned to
   that route must not be changed (except by omission), unless the
   speaker changes the value of the Next Hop attribute of the route.

These paragraphs say that the labels are valid with respect to the LSR identified by the Next Hop attribute, and the labels may not be changed unless

  • Either the next hop changes as well,
  • Or the labels are removed alogether, and the routes are redistributed as pure IPv4 routes (note the allowed option of unlabeling the labeled routes!)

I agree with Riccardo the behaviour is now correct in IOS.

I must honestly say that I do not think at all that the behavior is now correct - because in general, this can not work properly. You see, what I object to is router Y simply keeping the labels as received from router X when advertising routes to Z. In fact, this would work only if the router X and Y were peered in eBGP using their loopback addresses, an action that further complicates the inter-AS peering (the need to create static routes between X and Y to mutually reach these loopbacks, the need to redistribute them into IGPs of the corresponding ASes).

Imagine that you activated a BGP peer in address-family vpnv4 but the command neighbor send-community extended would not be added automatically, contrary to the IOS behavior. Understandably, the exchange of VPNv4 prefixes would then be impossible because without extended communities, it would be impossible to carry the sets of RTs with each VPNv4 route. I believe that this behavior would be highly objectionable. In the same way, in my opinion, allowing a neighbor to receive labeled routes without implying next-hop-self is strongly objectionable - because apart from specific scenarios, this configuration will behave incorrectly.

Best regards,

Peter

Hi Peter,

I just got the confirmation that the behavior you see now is the correct one and of course RFC3107 confirms it.

The ratio behind it is that when a LSR assigns label it starts 'attracting' traffic towards the prefixes it assigned the labels for as it is advertising that it is in the path. That is the reason why you do not configure next-hop-self on RRs otherwise they will attract all the traffic in the network (bringing it to its knees) as they should not be in the traffic path.

So the implicit next-hop-self behavior you previously saw is indeed buggy.

By the way also in IOS-XR the default behaviour has now changed and we do need to expressely configure next-hop-self if we want the LSR to assign local labels to prefixes.

The internal bug which introduced it is "CSCtk53821 BGP IAS functionality now requires explicit next-hop-self config"

regards,

Riccardo

Hello Riccardo,

I had to think things over at least twice Huge thanks to you, Luc and Giuseppe for not being swayed by my (usually) persuasive arguments. I seem to finally get the idea behind the entire stuff and appreciate the logic in what you and Giuseppe told me. Thank you very much!

I now see the flaw in my logic: I assumed incorrectly that just because a BGP router advertises labeled routes, the labels must be assigned by the advertising router and are related to it. Wrong! The labels are related to the router identified by the NEXT_HOP attribute as it is this router that originated the label mappings in the first place, and other BGP routers may simply be relaying these labeled routes. Unless a BGP speaker changes the NEXT_HOP to itself, it is not allowed to modify the label mappings. Knowing when to remove the labels and advertise pure IPv4 networks is too difficult to perform reliably, so no surprise that my router Y simply relayed the labeled routes to router Z without removing the labels.

Once again, huge, huge thanks to you, Riccardo and Giuseppe!

Best regards,

Peter

Hi all,

I had big issue with Next-hop and send-label togather from the ASBR to the route-reflector on the ibgp ipv4 peering and some how the send-label not forwarding the labels and I have to look for different solution. This is nearly 4 years back.

regards,

Skanda

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: