ASR9000/XR Using MST-AG (MST Access Gateway), MST and VPLS

xthuijs · ‎03-31-2011

Introduction

In this document the concept of MST (multiple spanning tree) access gateway and the difference to regular MST will be highlighted. Also it shows some design scenarios of when to chose for which particular implementation.

Spanning Tree

It is generally well understood that in a layer 2 network loops are disastrous. Spanning-Tree Protocol (STP) aids in the detection of loops and breaking that loop to prevent broadcast and unknown unicast packets for circuling around for forever and bringing down the network eventually.

STP operates by selecting a root bridge in the network, who will have all its ports in forwarding and the other switches determining their path cost towards the root switch.

The way the root switch is elected is by means of a priority and when all are equal the lowest switch mac address is selected to be the root. While in default settings a loop is prevented, for good network design it is important that the priorities are set correctly to define where the root switch (and its potential backup) are going to be located.

Root switch election

The Root Bridge is usually determined by an administratively assigned Priority number
If all switches have the same Priority, the switch with the lowest MAC address becomes the Root Bridge
All switch ports begin in the Blocking state to prevent loops
The Root Bridge, once elected, is the only bridge with all ports active (Forwarding state)
Root Ports on other switches are placed in Forwarding and provide the lowest cost path back to the Root Bridge
A port stays in a Blocking state if STP determines that there is another path to the Root Bridge with a lower (better) cost

The states that a port can be in are determined as follows:

Blocking – Starts in this mode, and stays in Blocking if STA determines that there is a better path to the root bridge. Port only listens for BPDUs from other bridges (Max age 20 seconds)
Listening – Enters this mode after the Root Bridge election, when BPDU updates are being used to find the lowest cost path to the Root. Attempts to learn other paths to root bridge, to ensure that a loop won’t be created if it begins Forwarding (15 second transition)
Learning – Enters this mode after Listening. Port adds learned addresses to its table, still not allowed to send data (15 second transition)
Forwarding – Enters this mode after Listening. Port is now able to send and receive data – Normal operation

Just as a summary overview the follow topology:

In this example we consider this picture after all BPDU's have been exchanged. Based on priority and mac address S1 is elected as root bridge.

S1 will move both his ports from blocking to forwarding.

Switch S2 will move his root facing port in forwarding mode, which will become the RP (root port).The Port towards S3 will also be in forwarding

because S3, although having a direct link to the root switch S1, the path cost is higher (100) then going via S2->S1 (38). For this reason

the port on S3 to S1 will be blocking.

S4's path to the root bridge is either via S2 or S3. The path to S2 is a higher cost then via S3, ehcne S4->S2 is blocking and the path

to S3 is chosen and that port on S3 will become the RP.

This picture assumes that there are no vlans and just plain ethernet, also this diagram shows a local area network.

What if we want to use multiple vlans or interconnecting this network via a carrier ethernet network to a remote LAN?

Multiple VLANs

In the scenario where you have multiple vlans, regular STP will block the link for all vlans. While this prevents the loop, it is maybe not that

efficient as one node/path is completely in standby mode. It might be nice to forward a few vlans over to switch 1 and the others to switch 2.

Effectively that means that Sw1 is then root for a vlan set and Sw2 for another vlan set.

Regular STP cannot do this, and the logical evolution of htat is MSTP (multiple spanning tree) which is hte more standardized version, and PVST(+) which is a cisco proprietary solution.

They effectively achieve the same PVST and MSTP, one of the key differences is that MSTP sends the bpdu's out untagged on the port, where

PVST sends the bpdu's inline with the vlan, hence are vlan tagged.

Connecting Layer domains via a carrier ethernet network

The most common way to connect 2 separate l2 segments or networks together is via VPLS.

With VPLS the edge nodes from the provider are aggregating the customers L2 traffic, and participate in the L2 spanning tree loop prevention as well as Pseudo Wires over an MPLS core to remote PE's (Provider Edge) to bring the traffic from one segment to the remote site.

The way VPLS works and the interaction with MST or MSTAG and what the differences are will be discussed below.

MST

MST allows us then to run an STP instance per set of vlans that we can configure.

A sample configuration on the ASR9000 PE node looks as follows. Each set of vlans is defined under an instance.

We can adjust the priority per instance if needed, multiple vlans per instance are allowed also.

Sample MST configuration on the ASR9000

spanning-tree mst MYSTP_DOMAIN
name testme

! The name of the MST region is very important, it must be the same for all switches in this region.

! also the definitions of your MST instances need to be the same on all nodes.
revision 1
instance 0
priority 4096
!
instance 1
vlan-ids 100
priority 4096
!
instance 2
vlan-ids 101
priority 4096
!
interface TenGigE0/3/0/6
! Interfaces that are enabled for MST. Note that these are the main interfaces
interface TenGigE0/3/0/7
!
interface Bundle-Ether100
!
interface GigabitEthernet0/0/0/27
!
!

To provide a more graphical example of how MST can be used with 9k's is shown here:

In this case we have a clear STP loop between the 2 9K PE devices interconnceted via a bundle ethernet.

Associated configuration for this example would be as follows:

STP portion

spanning-tree mst Example

name testme

revision 1
instance 0
priority 4096
!
instance 1
vlan-ids 100
priority 4096
!
interface Bundle-Ether100
!
interface GigabitEthernet0/0/0/27
!

With this config we are providing the ability to the 9k PE to send bpdu's out to the other 9k and the access switch.

Also with this config, one link will be blocked and if we elect either 9k to be the root switch (say the one on the left), the link marked RED will be blocked.

This config however doesn't provide for any data configuration forwarding. For this we need to establish a separate bridge-domain whereby we pull in the right EFP's for forwarding the data traffic:

Data Plane portion

interface bundle-e100.100 l2transport

encapsulation dot1q 100

rewrite ingress tag pop 1 symmetric

!rewrite is optional depending on whether all EFP's are in the same vlan.

interface gigabitethernet 0/0/0/27.100 l2transport

encapsulation dot1q 100

rewrite ingress tag pop 1 symmetric

!rewrite is optional depending on whether all EFP's are in the same vlan.

l2vpn

bridge group EXAMPLE

bridge-domain FWD_1

interface g0/0/0/27.100

interface bundle-e100.100

You need to repeat this configuration for every vlan you want to forward. There is another article detailing more about vlan rewrites and the EFP concept in case you're interested. See the related documentation section for a reference.

VPLS

VPLS is the concept of connecting multiple layer 2 domains over an MPLS network for instance. On the ASR9000, a bridge-group is used to pull in the attachment circuits (physical interfaces towards the lan segment) and Pseudo Wires (PW) to the remote PE's.

A sample configuration achieve vpls looks as follows, this provides the configuration for the data plane and assumes there are no loops in your L2 topology either at the customer access site or within your VPLS domain.

To prevent loops in the access network we need to leverage either MSTP, MSTAG or PVSTAG. We'll discuss MSTAG in the next section below.

l2vpn
bridge group VPLS
bridge-domain vpls_1

! the bridge-group vs domain is just a configuration hierarchy, it doesn't serve any special functionality.
   interface GigabitEthernet0/0/0/0.100
   ! Phyiscal interfaces towards a subscriber switch
   neighbor 1.1.2.3 pw-id 123
   ! for H-VPLS we can use PW's also as an attachment circuit
   vfi vpls_1_vfi_1
    neighbor 5.5.5.5 pw-id 333
    ! definition of a pseudo wire underneath a Virtual Forwarding Instance
    neighbor 6.6.6.6 pw-id 444
    !
   !
!
!
!
end

The IP address providing in the "neighbor" statement are the MPLS router ID's from the remote PE's. the PW-ID is an arbitrary number, unique, that defines the VC label.

Whether you put the PW's in the VFI or outside the VFI or across VFI's depends on your needs and whether you need SPLIT HORIZON (see below).

VPLS and L2 Loops

The following picture explains what might happen when we don't use any STP in a VPLS scenario.

In this case there is a loop, but the access switch doesn't know about it because both the 9k PE's, nor the switches

form a closed ring.

Even if we'd be running MSTP in this scenario there is no loop detected, since by default BPDU's are not forwarded over the pseudo wires.

A potential solution might be to run an L2 link between the 2 southern PE's so that there is a loop on the SOUTH segment and indeed one UP link will be blocked from the access switch to one of the 2 PE's as per regular (M)STP.

The problem is here however that a (broadcast/unknown unicast) packet arriving on the Left South PE's pseudo wire is now then sent to the access switch south AND over the interchassis link (not drawn in this picture) to the SOUTH PE on the right. There will be a loop again.

A proper solution for this model is the use of MST Access gateway which will be highlighted below.

Split Horizon

Normally in a bridge domain, broadcast and unknown unicast from Attachment circuits are replicated to all bridge ports.

Obviously packets are never sent to the AC or PW that the traffic was actually received on.

By default AC's can always forward packets to each other and to (all) Pseudo Wires.

So traffic from the AC "west" will be replicated over all PW's and the South-West AC.

When traffc arrives on a PW then by default packets are never sent out the PW that they are received on and to other Pseudo Wires in the same

VFI. All PW's in the same VFI share the same split horizon group. And traffic is not replicated within the same split horizon group.

When packets arrive on the Pseudo Wire they are NOT forwarded out of PW's in the same VFI.

You can also move Attachment Circuits into a split horizon group to prevent them from speaking with each other by means of the "split horizon" group command underneath the interface which is configured in the l2vpn bridge-domain.

l2vpn

bridge group TEST

bridge-domain SAMPLE

interface g0/1/0/0

split-horizon

interface g0/1/0/10

split horizon

interface te0/2/0/3

vfi VFI_TEST

neighbor 2.2.2.2 pw-id 100

In this case traffic cannot flow between 0/1/0/0 and 0/1/0/10.

MST and Split Horizon

For illustrational purposes consider this sample VPLS design.

In order for MST to work, we need an inter PE attachment circuit to exchange the BPDU's between the two PE nodes drawn in blue.

In this given example the PE_left (PE-1) is considered ROOT.

Imagine there is a broadcast coming in on PW-1. Because of split horizon the traffic is not replicated to the PW 2, 3, 4 and 5. But traffic will go down the AC-2 and also sent over to AC-1.

When traffic enters the PE_right (PE-2), it will not go down AC-3 because it is blocking since PE-1 is root, but it will enter the VFI and gets replicated to the PW's in the VFI there so PW 3, 4 and 5. This poses a big problem considering with a replication loop back to PE-3 and also to PE-1.

Omitting PW-5 solves part of the issue so that traffic is not replicated back to PE-1, but it might slow down convergence in case AC-2 is going offline and PE3/4 have not yet updated their mac tables yet.Traffic will still get back to PE-3 and PE-4.

The AC-1 is required for BPDU, and you might want to consider only creating an EFP for hte untagged traffic (BPDU's), but then you might have also forwarding issues in case the PW 1 and 2 are down and you want to send traffic over the AC 1 to PE-right so it can forward the traffic for us to the PE3 and 4.

Wouldn't it be nice if we can live without the AC-1 all together, still run spanning-tree to the CE and have optimum convergence?

Yes. Enter MSTAG.

MSTAG

In MSTAG we define the BPDU's on the PE nodes statically presenting them as 1 virtual bridge to the CE.

One link will be blocked, but there is no need for an inter chassis link anymore in this case.

What’s the main function of the MST access gateway?

Send pre-canned BPDU into access network at hello timer
Snoop the TCN from access network, flush its local MAC address table and trigger VPLS MAC withdrawal accordingly

Major Advantage – scale and local significant
Light MST implementation, for example, it doesn’t keep STP state machine, it doesn’t need to handle received BPDU (except TCN)

The MST is per port scope

Other Advantages
Doesn’t require inter-PE special PW, no single point of failure, no temp L2 loop.

Much robust than the “MST over special PW” solution
Standard based solution, inter-operable with 3rd vendors, work with any network topology
Self protection, even with user mis-configuration, it won’t cause L2 loops

Disadvantages
MST convergence depends on the number of VLANs in the access ring and the MST implementation of the access switches. In any case, don't expect 50msec convergence time
With Cisco 3400 as access switch, the baseline convergence show sub second for link failure, sub 100msec for link recovery, 2-3 seconds for node failure

Note that in this configuration we use the interface with suffix .1 in the MSTAG configuration.

This means we need to define an EFP (Ethernet Flow Point) to capture the BPDU's and TCN packets. In fact, we're not even using the bpdu's received, as we perceive ourselves to be root on the 9k and send these precanned BPDU's out.

We will consume the TCN (topology change notification) and send these into the VPLS network as mac withdrawl messages.

interface gigabitEthernet 0/0/0.10.1 l2transport

encapsulation untagged

Aside from the MST configuration we still need to configure our bridge domains with the EFP's for the data forwarding and our Pseudo Wires to our remote PE's as described above.

MSTP/MSTAG scale for the ASR9000

1) MSTP: There is a single protocol instance which can have the

standard 64 MST Instances (MSTIs) within it. These 64 MSTIs create 64

logical spanning tree topologies within one MSTP region/ domain.

2) MSTAG: You can create a separate protocol instance per physical

interface and each each protocol instance can be in a separate MSTP

region by itself and each one can in turn support 64 MST Instances

(MSTIs) within it.

In general MSTAG is more scalable (multiple regions with 64 MSTIs each)

but can only be used if the ASR9K is in the root (or backup root)

position for every MSTI. MSTP is the normal Cat 6K like version but you

can use all 64 MSTIs without any issues. Both of these can interoperate

with any IEEE standard MSTP implementation so should work with the N7K

VDCs.

Related Information

For more details on regular MST and some IOS interoperability considerations, check this reference:

ASR9000 MST interop with IOS/7600: VLAN pruning

Learn more about vlan rewrites and the concept of EFP's

ASR9000/XR Flexible VLAN matching, EVC, VLAN-Tag rewriting, IRB/BVI and defining L2 services

Xander Thuijs, CCIE #6775

Sr Tech Lead ASR9000

Carlos T · ‎08-31-2011

Hi Xander,

Very interesting your article

Please, can you post an article about repag at the asr9k? Mainly whats the difference between repag and mstpag?

why choose one over the other?

Thanks,

Carlos Trujillo.

xthuijs · ‎08-31-2011

Hi Carlos, thanks for your comment!

Great suggestion, I'll make a write up on that also.

To explain briefly:

As you can see from this article MSTAG is not really MST, we are just using the BPDU tcn capability to trigger vpls mac withdrawl.

At the same time we inject pre-canned bpdu's to inform our access switches that a loop does exist and that something needs to be blocked. We don't really listen to the bpdu's but merely informing the access network that the 9k (or both if dual homed) is effectively behaving like the same switch/bridge-id. The 9k in this mode does not run full mst saving a lot of processing cycles, however this design assumes the 9k to be the ROOT.

As for REP-AG:

The 9k does not run full REP, it can't be part of the ring, but only the endpoints. REP doesn't use bpdu's as MSTP does, but REP can send TCN when something in the ring changes. With the rep edge no neighbor configuration, the REP edge devices connecting to the 9k will send a TCN notification that the 9k will trigger on when it is configured for REP_AG causing it to do mac widthdrawl.

Why choosing one over the other? Depends on your access network really and the convergece you'd expect. REP is supposedly a little faster in detection. Or the access switches may not support one protocol over the other.

Functionally REPAG and MSTAG are the same: trigger on TCN and doing vpls mac withdrawl.

xander

CHARLES HEUPEL · ‎10-11-2011

Alexander,

Great article. One quick question for you. Would MSTAG still work if the access devices didn't participate in spanning-tree at all. They simply pass the BPDUs through to the other side? So, in this case if ASR1 sent a BPDU out towards the access device, it would simply pass the BPDU out it's interface connecting to ASR2. What do you suggest in this scenario?

Thanks,

Shane

xthuijs · ‎10-11-2011

Appreciate that Shane!

Yup no problem technically no problem, however in that case there is also no real need to run STP on the 9k's either.

MSTAG means we are sending pre canned BPDU's with the bpdu config options as per MSTAG configuration,

we are not interpreting the config BPDU's at all and only trigger on TCN bpdu's and those will result in VPLS mac withdrawal.

So if the access switches are transparent, and the BPDU's are received by the 9k peer it just ignores them, but as mentioned this is a bit of a flaky design obviously.

cheers

xander

CHARLES HEUPEL · ‎10-11-2011

Thanks......so no real issues with both links being up and actively forwarding traffic for the same vlans. In our scenario we're aggregating DSL customers who might have multiple vlans (voice/video/internet) depending on the services they buy. Most access gear supports some form of STP but in this paticular case, this vendor does not. In the past, we've used 7600s with flex link but don't have that option with the ASR9K. Just trying to come up with the best solution.

Thanks again for the quick reply.

Shane

xthuijs · ‎10-11-2011

Ah yeah 9K doesn't have flexlink, there are alternatives:

1) MSTAG as you noted, but that requires the access switch to participate in MSTP. In MSTAG all ports of the 9k are in forwarding (because they are the root), and the access switch needs to do the failover from one link to the other.

2) If the Access Switch can't do MSTP, and you are single homed, you could consider PW redundancy.

3) If that is no option you can consider MC-LAG (multichassis LAG). the Access switch (DHD ~ dual homed device) "thinks" it is having a bundle to the same POA (point of attachment), where it is actually 2 separate devices acting as one (talking to each other via ICCP). One (set of) members to one of the 9k's is then in standby.

4) Finally if that can't be done because the access switch doesn't do LACP for instance, then there is a final option of using EEM (embedded event manager). You can create a "emulated flexlink" by writing a simple EEM (Embedded Event Manager) script to control a pair of active-standby interfaces. It won't be very fast switchover however (may take anywhere between 0.5 seconds and 3 seconds).

Hopefully one of these will work for you

xander

CHARLES HEUPEL · ‎10-11-2011

Gottcha.....unfortunately, we are dual-homed and the switch doesn't support LACP, I had already headed down that road before finding out they didn't support it.

So with EEM, that would only work in a single homed scenario correct? I've been reading about EEM on the ASR9K but haven't actually set it up.

CHARLES HEUPEL · ‎10-18-2011

Alexander,

How would MST or MSTAGG work in a bridge-domain configured between ASR9Ks and 7600s? We want to deliver metro-e services to a customer that has sites in our neighboring telco and they are running 7600s. We have everything setup on the 9Ks and 7600s but seem to be running into a loop somewhere. I've configured MST on the ASR9K but they are saying that they don't have a spanning tree instance for the S-VLAN (2044) on their side. Does the 7600 not run STP on VLANs configured under service instances?

Our Side:

ASR-A

int gi0/0/0/18.2044

To Neighbor telco

encapsulation dot1q 2044

rewrite ingress tag pop 1 sym

!

int bundle-eth 100.2044

To metro-e cust

encapsulation dot1q 2044

rewrite ingress tag pop 1 sym

!

l2vpn

bridge group metro-e

bridge-domain cust_a

interface gi0/0/0/18.2044

interface bundle-eth 100.2044

!

vfi Citizens_Bank

neighbor 10.8.0.1 pw-id 2044

pw-class mpls

!

ASR-B

Same configuration as ASR-A except neighbor is .2

7600-A

l2 vfi cust_a manual

vpn id 2044

neighbor 10.0.0.4 encapsulation mpls

neighbor 10.0.0.2 encapsulation mpls

!

interface gi7/18

To ASR-A

service instance 2044

encapsulation dot1q 2044

rewrite ingress tag pop 1 sym

bridge-domain 2044

!

int vlan 2044

xconnect vfi cust_a

7600-B

same as 7600-A except neighbor is 10.0.0.1 on vfi

xthuijs · ‎10-19-2011

hi Charles,

I have a hard time visualizing the design you are following in this setup, but just to give you a few answers:

By default the 9k doesn't run any STP at all.

So based on the ports that you want to participate in the spanning-tree with, you need to configure the

spanning-tree mst NAME

interface x

interface y

like configuration. This will make us send BPDU's out of the X and Y interfaces and process the received ones.

In order for such a design to work properly, there needs to be a link between the 9k's that is also running spanning tree so that the loop is properly detected. A PW between the 9k's won't run spanning tree and might constitute the loop you're having.

MSTAG is very useful in the desing/scenario whereby you have a (ring of) access-devices that are dual homed to 2 9k's. MSTAG assumes the 9k's being the root of that spanning-tree topology and the access devices will block one of their uplinks. In this scenario the inter chassis link is NOT needed because both 9k's pretend to be the same root bridge already by means of these "configured" or pre-canned bpdu's as per MSTAG configuration.

good luck!

xander

CHARLES HEUPEL · ‎10-19-2011

Alexander,

Thanks for the reply. I'm attaching a diagram to give you a better visual. Basically we have a ring of ASRs running MPLS and our neighbor has a Ring of 7600s running MPLS. We have 2 dark fibers between us for delivering metro-e services to a couple of customers. The 7600s have ES+ cards and terminating our connections with service instances. When we do a show spanning-tree vlan for the vlan we're sending to them, there is not an instance of spanning tree running for that vlan. So my worrie is even if we configure the ASR9K as MSTAG the 7600s won't block. I didn't realize there is no spanning-tree instance for service-instances on 7600. Am I missing something? I would prefer to run MPLS between us but due to some overlapping IPs, we can not do this.

xthuijs · ‎10-20-2011

Aha, that visualizes it quite nice charles.

This is a tricky design indeed. I don't see any other options then either using H-VPLS and making one ring "client" of the other.

Or running MSTAG between the 2 9k PE's and the 7600 PE's.

you need trunk interfaces from the 7600 to the 9k PE's and then the 7600 will or should block either one of their uplinks for that instance (set of vlans) of MSTP. that looks like your best bet.

MAke sure you take note of the other article on vlan pruning in IOS and the effect on XR.

it will prevent some headache.

xander

laviel · ‎11-24-2011

Hello Alexander,

I could not find clear documentation for configuring MST-AG on 7600 (which we've prepared in our lab for that purpose). Any chance you could direct us to a link for such a guide?

Thanks,

Lior

xthuijs · ‎11-25-2011

Hi Lior,

aik I am sorry, the MSTAG functionality is something that is "unique" to the asr9000. 7600 doesn't have that capability.

It would use the "special pseudowire" option between the PE's.

regards

xander

laviel · ‎11-26-2011

Hi Alexander,

Thank you very much for your answer. I guess we'll try it on the ASR's than.

Lior

Maximiliano Gustavo Menendez · ‎07-03-2012

Hi Alex, great doc!

I´d like to know if MST AG is supported with only one Access Gateway. I mean the two legs of the L2 access ring connectd to the same ASR9K, diferent ports, diferent cards. All the documentation about MST AG, mention at least, two Access Gateway devices, but I want to use this feature with only one. What do you think ? Have you tested ??

Thanks in advnace,

Max