Solved: MPLS-TE Tunnel Down Problem

davidhuynh5 · ‎10-30-2010

Please see the attachment. Tunnel5542 on ms-3722-man-r2 is up and working. Tunnel5542 on sa42-man-r1 is down. These are dynamic path using IGP.

The path is valid, but the tunnel will not come up. Maybe it's an rsvp signalling issue? Any suggestions on how to fix it would be greatly appreciated.

My interfaces on both ends are configured with mpls traffic-eng tun and ip rsvp bandwdith.

Thanks.

Peter Paluch · ‎10-30-2010

Hi David,

The output from the sa42-man-r1 suggests that the RSVP signalling necessary for the tunnel establishment is proceeding, however, it is not completed yet. As this signalling should take at most seconds, there is indeed some problem related (not necessarily caused by) to the RSVP signalling.

According to the output, the explicit path for this tunnel follows the sequence of interface 10.0.4.254 and 10.4.1.81. Please verify that these interfaces and their neighbors are properly configured for MPLS TE, i.e. the mpls traffic-eng is enabled both on the routers and on the particular interfaces through which the tunnel is carried. In addition, try to use the show ip rsvp reservation on all nodes that the tunnel passes through to see whether the request bandwidth has already been allocated for this tunnel.

Perhaps it would also be worth the effort to verify whether the neighboring routers see themselves as RSVP neighbors, using the show ip rsvp neighbor command.

Best regards,

Peter

View solution in original post

Peter Paluch · ‎10-31-2010

Hello David,

So far, I was not able to find out the cause of your problems. Can you please at least verify on the path

10.0.4.254 10.4.1.81

that there are absolutely no ACLs that could block the RSVP communication? The RSVP is a separate IP protocol number 46.

Also, this output puzzles me:

sa42-man-r1#show ip rsvp reservation
To            From          Pro DPort Sport Next Hop      I/F      Fi Serv BPS
10.0.1.4      10.4.1.81     0   5542 203   10.0.1.4               SE LOAD 33M
10.0.1.4      10.4.1.81     0   5543 35    10.0.1.4               SE LOAD 45M
10.4.1.81     10.0.1.4      0   5543 1154 10.0.252.18   Se3/0    SE LOAD 45M

Notice that there is already an established reservation from 10.0.1.4 to 10.4.1.81 for 45 Mbps out the interface S3/0. Is that a MPLS TE tunnel? The Tunnel5542 you are trying to establish from 10.0.1.4 to 10.4.1.81 is, according to the show mpls traf tun output, directed via S1/0 (probably because of available bandwidth on the path), not via S3/0:

RSVP Signalling Info:
       Src 10.0.1.4, Dst 10.4.1.81, Tun_Id 5542, Tun_Instance 73
Shortest Unconstrained Path Info:
    Path Weight: 2 (TE)
    Explicit Route: 10.0.4.254 10.4.1.81

The 10.0.4.254 is directly reachable via S1/0. Please have a very close look if there is any ACL or similar thing blocking the RSVP on the line 10.0.4.253/10.0.4.254.

Best regards,

Peter

View solution in original post

Mahesh Gohil · ‎11-01-2010

Hi David,

Let me take chance to explain FRR.

> When you rely on IGP the drawback are:

-- In a large network, your IGP can take quite a few seconds to converge; until the entire network is converged, there is packet loss. It's not uncommon to see 5 to 10 seconds of packet loss when a core link flaps in a large network

--Configuring the IGP to converge quickly can make it overly sensitive to minor packet loss, causing false negatives and IGP convergence for no reason.

> When you have used TE tunnel without FRR.

Headend rerouting is calculating a new path for an LSP after its existing path goes down. However, during the time required to perform this basic reroute, there can be significant traffic loss; the packet loss is potentially worse than with regular IP routing if you are autorouting over the TE tunnel because you first need to signal a new TE LSP through RSVP and run SPF for destinations that need to be routed over the tunnel.

This kind of behavour create problem for time sensitive traffic (voice, video..etc)

Generally the goal of the FRR mechanisms is to achieve as little packet loss as possible. <50 ms. Now question is how it is achieved.

Well everything is kept ready ..meaning LSP is ready and it is already kept signalled. so whenever there is failure in primary tunnel the traffic is shifted

to backup within time frame of <50 ms.

Well it is very vast topic if you go in detail..I prefer below book always

Traffic Engineering with MPLS by

Eric osborne and Ajay Simha

Regards

Mahesh

View solution in original post

Peter Paluch · ‎11-01-2010

Hello Mahesh,

Thank you for filling me in.

Let me add some remarks to your description. Indeed, the point of FRR tunnels is to have the tunnel pre-provisioned and prepared beforehand, just waiting for the node or the link that is protected by it to fail. If the protected link or node is working fine, the FRR tunnel is unused. If the protected object fails then - because everything is already set up and in place - the router needs only to detect the neighbor or link failure and start using the backup FRR tunnel.

And this is also the main problem about the FRR How shall a neighbor or a link failure be detected as soon as possible, and how shall the forwarding/switching hardware be reconfigured so quickly to achieve a blazingly fast reconvergence?

The 50ms value comes from SDH/SONET protection system where the system guarantees recovery in roughly 50ms of link outage. First FRR support on IOS was therefore available only on Packet-over-SONET (POS) interfaces that have the hardware ability to report link failures quickly. Detecting a link failure on a different type of interface (Ethernet, Serial, etc.) can be more problematic and without some additional mechanism, it may be impossible to perform it quickly enough. Additional mechanisms like BFD have been developed to aid this inherent lack of quick failure detection.

Another problem is that even if the outage is detected quickly enough, it is necessary to compile a new FIB/ADJ database and download it to the hardware switching matrix. I cannot find the document right now but I remember reading a PDF file on Cisco website saying that this can actually take a lot of time, again possibly endangering the 50ms limit.

So the FRR idea itself is very nice, however, these tunnels are not automatically provided, rather they must be pre-provisioned to circumvent the backed-up nodes or links, and it depends very much on the hardware used whether the switchover can really be accomplished under 50ms. Nevertheless, I am not bashing FRR tunnels - most certainly, they provide a reconvergence time in order of magnitudes shorter than waiting for routing protocols to converge. It's just the FRR is not the solution alone on itself.

Best regards,

Peter

View solution in original post

Peter Paluch · ‎10-30-2010

Hi David,

The output from the sa42-man-r1 suggests that the RSVP signalling necessary for the tunnel establishment is proceeding, however, it is not completed yet. As this signalling should take at most seconds, there is indeed some problem related (not necessarily caused by) to the RSVP signalling.

According to the output, the explicit path for this tunnel follows the sequence of interface 10.0.4.254 and 10.4.1.81. Please verify that these interfaces and their neighbors are properly configured for MPLS TE, i.e. the mpls traffic-eng is enabled both on the routers and on the particular interfaces through which the tunnel is carried. In addition, try to use the show ip rsvp reservation on all nodes that the tunnel passes through to see whether the request bandwidth has already been allocated for this tunnel.

Perhaps it would also be worth the effort to verify whether the neighboring routers see themselves as RSVP neighbors, using the show ip rsvp neighbor command.

Best regards,

Peter

davidhuynh5 · ‎10-30-2010

Hi Peter, please see attachement. Everything looks fine, but the tunnel still

does not come up. The other side comes up fine. Thanks for your expert knowledge.

Peter Paluch · ‎10-30-2010

Hi David,

I'll have a look on the attached file - I will respond probably in the next 12 hours as in Slovakia, it's nearly midnight.

A short comment, though: MPLS TE tunnels are unidirectional. The fact that one tunnel is up does not warrant or suggest that the second tunnel in the opposite direction will be up. They may, for example, follow different routes. So the knowledge of one tunnel being up is a valuable information but nothing definitive can be inferred from it right now.

Best regards,

Peter

Peter Paluch · ‎10-31-2010

Hello David,

So far, I was not able to find out the cause of your problems. Can you please at least verify on the path

10.0.4.254 10.4.1.81

that there are absolutely no ACLs that could block the RSVP communication? The RSVP is a separate IP protocol number 46.

Also, this output puzzles me:

sa42-man-r1#show ip rsvp reservation
To            From          Pro DPort Sport Next Hop      I/F      Fi Serv BPS
10.0.1.4      10.4.1.81     0   5542 203   10.0.1.4               SE LOAD 33M
10.0.1.4      10.4.1.81     0   5543 35    10.0.1.4               SE LOAD 45M
10.4.1.81     10.0.1.4      0   5543 1154 10.0.252.18   Se3/0    SE LOAD 45M

Notice that there is already an established reservation from 10.0.1.4 to 10.4.1.81 for 45 Mbps out the interface S3/0. Is that a MPLS TE tunnel? The Tunnel5542 you are trying to establish from 10.0.1.4 to 10.4.1.81 is, according to the show mpls traf tun output, directed via S1/0 (probably because of available bandwidth on the path), not via S3/0:

RSVP Signalling Info:
       Src 10.0.1.4, Dst 10.4.1.81, Tun_Id 5542, Tun_Instance 73
Shortest Unconstrained Path Info:
    Path Weight: 2 (TE)
    Explicit Route: 10.0.4.254 10.4.1.81

The 10.0.4.254 is directly reachable via S1/0. Please have a very close look if there is any ACL or similar thing blocking the RSVP on the line 10.0.4.253/10.0.4.254.

Best regards,

Peter

davidhuynh5 · ‎11-01-2010

Hi Peter,

Your suggestion solved my problem. I justed the "tunnel mpls traffic-eng bandwdith" statement and the tunnel came up.

One last question, can you explain what is fast reroute in mpls and why would I used it? Thanks again for your expert knowledge.

Mahesh Gohil · ‎11-01-2010

Hi David,

Let me take chance to explain FRR.

> When you rely on IGP the drawback are:

-- In a large network, your IGP can take quite a few seconds to converge; until the entire network is converged, there is packet loss. It's not uncommon to see 5 to 10 seconds of packet loss when a core link flaps in a large network

--Configuring the IGP to converge quickly can make it overly sensitive to minor packet loss, causing false negatives and IGP convergence for no reason.

> When you have used TE tunnel without FRR.

Headend rerouting is calculating a new path for an LSP after its existing path goes down. However, during the time required to perform this basic reroute, there can be significant traffic loss; the packet loss is potentially worse than with regular IP routing if you are autorouting over the TE tunnel because you first need to signal a new TE LSP through RSVP and run SPF for destinations that need to be routed over the tunnel.

This kind of behavour create problem for time sensitive traffic (voice, video..etc)

Generally the goal of the FRR mechanisms is to achieve as little packet loss as possible. <50 ms. Now question is how it is achieved.

Well everything is kept ready ..meaning LSP is ready and it is already kept signalled. so whenever there is failure in primary tunnel the traffic is shifted

to backup within time frame of <50 ms.

Well it is very vast topic if you go in detail..I prefer below book always

Traffic Engineering with MPLS by

Eric osborne and Ajay Simha

Regards

Mahesh

Peter Paluch · ‎11-01-2010

Hello Mahesh,

Thank you for filling me in.

Let me add some remarks to your description. Indeed, the point of FRR tunnels is to have the tunnel pre-provisioned and prepared beforehand, just waiting for the node or the link that is protected by it to fail. If the protected link or node is working fine, the FRR tunnel is unused. If the protected object fails then - because everything is already set up and in place - the router needs only to detect the neighbor or link failure and start using the backup FRR tunnel.

And this is also the main problem about the FRR How shall a neighbor or a link failure be detected as soon as possible, and how shall the forwarding/switching hardware be reconfigured so quickly to achieve a blazingly fast reconvergence?

The 50ms value comes from SDH/SONET protection system where the system guarantees recovery in roughly 50ms of link outage. First FRR support on IOS was therefore available only on Packet-over-SONET (POS) interfaces that have the hardware ability to report link failures quickly. Detecting a link failure on a different type of interface (Ethernet, Serial, etc.) can be more problematic and without some additional mechanism, it may be impossible to perform it quickly enough. Additional mechanisms like BFD have been developed to aid this inherent lack of quick failure detection.

Another problem is that even if the outage is detected quickly enough, it is necessary to compile a new FIB/ADJ database and download it to the hardware switching matrix. I cannot find the document right now but I remember reading a PDF file on Cisco website saying that this can actually take a lot of time, again possibly endangering the 50ms limit.

So the FRR idea itself is very nice, however, these tunnels are not automatically provided, rather they must be pre-provisioned to circumvent the backed-up nodes or links, and it depends very much on the hardware used whether the switchover can really be accomplished under 50ms. Nevertheless, I am not bashing FRR tunnels - most certainly, they provide a reconvergence time in order of magnitudes shorter than waiting for routing protocols to converge. It's just the FRR is not the solution alone on itself.

Best regards,

Peter

davidhuynh5 · ‎11-02-2010

Peter and Mahesh,

I have two mpls-te tunnels from rtrA to rtrB and two tunnels from rtrB back to rtrA. Tunnel#1 goes through five different next-address hops before getting to rtrB. Tunnel#2 has a direct connection to rtrB. The load share of traffic between both tunnel is 50/50. How can I be certain that a packet goes not go down tunnel#1 and returns through tunnel#2, which causes asymetrical routing problems? What is the natural behavoir of the packet flow down these tunnels? Thanks again for all your help.

davidhuynh5 · ‎11-02-2010

Peter,

I just had a chance to re-read your post and that is a very interesting perspective on FRR. I may not deploy it and just keep using IGP as the fallback. FRR seems complicated.

Peter Paluch · ‎11-03-2010

David,

Well, I did not want to discourage you from using the FRR tunnels and I would be disappointed if I did that. I just wanted to highlight that the FRR tunnels by themselves are very good - there is nothing wrong with them - but there are also other issues related to a fast reconvergence. Obtaining the 50ms path repair time as Mahesh explained very nicely is hard to achieve and requires that besides the FRR, other mechanisms must also be used. SDH has similar mechanisms built in. Other technologies must resort to BFD, for example.

Still, even on Ethernet-type interfaces, the FRR will be significantly faster than just waiting for the IGP to reconverge. The only issue is that it won't be in 50ms time.

I would say - give the FRR a try, and if you're not satisfied, you can remove it at any time.

Best regards,

Peter