Solved: Bridge-domain traffic paths

Marks Maslovs · ‎11-21-2013

Hi guys,

Couldn't really get into logic of bridge-domain and hsrp coexistence. How traffic will be flooded?

Imagine following topology:

Bridge-domain and hsrp is running between ASR1 and ASR2.

Host C has two network adapters. Both are in UP state, but only one of them is forwarding traffic.

I am curious, what path traffic will take from host A to host C and from B to C in situation when :

1) net.adapter #1 is active

2) net.adapter #2 is active

p.s. active router for hsrp remains the same.

We have captured traffic on the devices, and it was a bit confusing to me that standby hsrp router was forwarding traffic from host B out of g0/0/0/0 and pw 3

I would appriciate any help...

xthuijs · ‎11-26-2013

this is a dangerous setup Marks, there is a loop possible with the PW in the same split horizon group as the attachment circuits.

To answer your question; the bridge will forward based on the mac learning table, if it knows that a mac (of the server) is found via one link it will continue to forward that way until that mac is flushed and we start to flood, or the mac is not refreshed and ages out, after which it will start flooding also.

when the server brings down the link, it is important that the 9k/router sees that link down also, so it can converge.

if it doesnt, then you need to do some IPSLA or something like that to "test" the link and remote host and trigger on that with a message to HSRP to failover or to bring down the interface so we flush the MAC.

regards

xander

View solution in original post

xthuijs · ‎11-26-2013

this is a dangerous setup Marks, there is a loop possible with the PW in the same split horizon group as the attachment circuits.

To answer your question; the bridge will forward based on the mac learning table, if it knows that a mac (of the server) is found via one link it will continue to forward that way until that mac is flushed and we start to flood, or the mac is not refreshed and ages out, after which it will start flooding also.

when the server brings down the link, it is important that the 9k/router sees that link down also, so it can converge.

if it doesnt, then you need to do some IPSLA or something like that to "test" the link and remote host and trigger on that with a message to HSRP to failover or to bring down the interface so we flush the MAC.

regards

xander

Marks Maslovs · ‎11-28-2013

Thank You Alex for your reply!

I think I've got it.

I don't know why but I guess I had a strong belief that both 9k's do know the MAC address of Host C device. Probably that was due to that I was reffering to wrong table "sh arp vrf rnc bvi 3 location 0/0/CPU0".

I had to check this one -

"sh l2vpn forwarding bridge-domain mac-address interface gigabitEthernet 0/0/0/0 location 0/0/CPU0". It showed that one of 9k's knew the remote MAC, but other didn't. That's why traffic was flooded out of pw and g0/0/0/0.

But what would you recomend to avoid loop in such setup? Do I have to enable under PW "split-horizon group" or this command will not separate PW from AC?

xthuijs · ‎11-28-2013

Hi Marks, the arp is merely the L3 adj info, the mac table is defines the L2 forwarding where the packet(s) go from a switching standpoint.

so while we know the mac and complete the L3 packet (as instructed by ARP), the l2 switching apparently doesnt know where it should go (eg 9k bottom hasnt learnt the mac properly knowing that it can be found via the pw only).

mac is learnt by source origination and if the packet is flooded out it means we didnt learn the mac properly, which makes sense because the g0/0/0/0 on the top router receiving it likely ont he bvi consumed the packet and those

packets never made it (flooded over) to the bottom 9k learning that mac over the PW.

so that host-C might need to send some grat arp or something that is broadcasted out, which means it is received by the top 9k, sent out hte PW so the bottom one learns it. and everytime there is a switch over, it needs to do that grat arp.

part of the problem is also that the concept of the host-c bringing down the port must signal to the attached 9k that the AC is down, otherwise a sort-of loop always exists. This becomes particularly annoying when both links form the host-c are sharing/using the same mac.

This looks like some RNC design, is that correct?

I think part of the solution is the ability of host-C to send periodic broadcasts to basically "teach" the BD's.

you can test that theory by pinging from the host-c to a bcast desintation and see if the mac is then seen on the other 9k BD and found via the PW.

also make sure that the link on the 9kis truly down when the host-c has it down.

regards

xander

Marks Maslovs · ‎12-02-2013

Okay, that really make sence. Thank You very much for the explanation!

Yes, You are right, that's RNC.

Theoretically the MAC address should be flushed away from the memory when the switchover of the network card appears, because, the connection for some seconds goes down.

Could You please tak a look on the following output:

As I understand, both ASR's do know where 0040.4384.8260 (This is RNC NPGEP mac address) is. So basically there should not be any flooding..

RP/0/RSP1/CPU0:ASR9k-1#sh l2vpn forwarding bridge-domain RNC:RNC3_TEST mac-address detail location 0/0/CPU0
Mon Dec 2 21:05:25.639 EET

Bridge-domain name: RNC:RNC3_TEST, id: 20, state: up
MAC learning: enabled
MAC port down flush: enabled
Flooding:
Broadcast & Multicast: enabled
Unknown unicast: enabled
MAC aging time: 300 s, Type: inactivity
MAC limit: 4000, Action: none, Notification: syslog
MAC limit reached: no
MAC Secure: disabled, Logging: disabled
DHCPv4 snooping: profile not known on this node
Dynamic ARP Inspection: disabled, Logging: disabled
IP Source Guard: disabled, Logging: disabled
IGMP snooping: disabled, flooding: enabled
Routed interface: BVI3, Xconnect id: 0x8000001f, state: up
IRB platform data: {0x14000a, 0x1, 0x0, 0x80000000}, len: 16
Bridge MTU: 1500 bytes
Number of bridge ports: 2
Number of MAC addresses: 2
Multi-spanning tree instance: 0

Mac Address: 0000.0c07.ac03, LC learned: N/A
Resync Age: N/A, Flag: static, BVI

Mac Address: 6c9c.ed0a.2e3d, LC learned: N/A
Resync Age: N/A, Flag: static, BVI

GigabitEthernet0/0/0/0, state: oper up
    Number of MAC: 1
    Statistics:
      packets: received 48765801690, sent 309298266072
      bytes: received 33416543382293, sent 54307173696538
    Storm control drop counters:
      packets: broadcast 0, multicast 0, unknown unicast 0
      bytes: broadcast 0, multicast 0, unknown unicast 0
    Dynamic arp inspection drop counters:
      packets: 0, bytes: 0
    IP source guard drop counters:
      packets: 0, bytes: 0

Mac Address: 0040.4384.8260, LC learned: 0/0/CPU0
Resync Age: 0d 0h 0m 0s, Flag: local

Nbor 10.9.9.253 pw-id 3
    Number of MAC: 1
    Statistics:
      packets: received 19771488146, sent 198111062527
      bytes: received 10977874479587, sent 50825792902418
    Storm control drop counters:
      packets: broadcast 0, multicast 0, unknown unicast 0
      bytes: broadcast 0, multicast 0, unknown unicast 0
    Dynamic arp inspection drop counters:
      packets: 0, bytes: 0
    IP source guard drop counters:
      packets: 0, bytes: 0

Mac Address: 6c9c.ed0a.9ced, LC learned: 0/0/CPU0
Resync Age: 0d 0h 0m 0s, Flag: global
L3 encapsulation Vlan: 2558

RP/0/RSP1/CPU0:ASR9k-2#sh l2vpn forwarding bridge-domain RNC:RNC3_TEST mac-address detail location 0/0/CPU0
Mon Dec 2 21:05:49.504 EET

Bridge-domain name: RNC:RNC3_TEST, id: 15, state: up
MAC learning: enabled
MAC port down flush: enabled
Flooding:
   Broadcast & Multicast: enabled
   Unknown unicast: enabled
MAC aging time: 300 s, Type: inactivity
MAC limit: 4000, Action: none, Notification: syslog
MAC limit reached: no
MAC Secure: disabled, Logging: disabled
DHCPv4 snooping: profile not known on this node
Dynamic ARP Inspection: disabled, Logging: disabled
IP Source Guard: disabled, Logging: disabled
IGMP snooping: disabled, flooding: enabled
Routed interface: BVI3, Xconnect id: 0x8000001a, state: up
IRB platform data: {0xf000a, 0x1, 0x0, 0x80000000}, len: 16
Bridge MTU: 1500 bytes
Number of bridge ports: 2
Number of MAC addresses: 3
Multi-spanning tree instance: 0
To Resynchronize MAC table from the Network Processors, use the command...
    l2vpn resynchronize forwarding mac-address-table location

GigabitEthernet0/0/0/0, state: oper up
    Number of MAC: 0
    Statistics:
      packets: received 782133119087, sent 620642426712
      bytes: received 514958352902308, sent 107302134940298
    Storm control drop counters:
      packets: broadcast 0, multicast 0, unknown unicast 0
      bytes: broadcast 0, multicast 0, unknown unicast 0
    Dynamic arp inspection drop counters:
      packets: 0, bytes: 0
    IP source guard drop counters:
      packets: 0, bytes: 0

Nbor 10.9.9.254 pw-id 3
    Number of MAC: 3
    Statistics:
      packets: received 297905813562, sent 17722149746
      bytes: received 68165206300571, sent 10642920750826
    Storm control drop counters:
      packets: broadcast 0, multicast 0, unknown unicast 0
      bytes: broadcast 0, multicast 0, unknown unicast 0
    Dynamic arp inspection drop counters:
      packets: 0, bytes: 0
    IP source guard drop counters:
      packets: 0, bytes: 0

Mac Address: 0000.0c07.ac03, LC learned: 0/0/CPU0
Resync Age: 0d 0h 0m 0s, Flag: global
L3 encapsulation Vlan: 510

Mac Address: 0040.4384.8260, LC learned: 0/0/CPU0
Resync Age: 0d 0h 0m 0s, Flag: global
L3 encapsulation Vlan: 510

Mac Address: 6c9c.ed0a.2e3d, LC learned: 0/0/CPU0
Resync Age: 0d 0h 0m 0s, Flag: global
L3 encapsulation Vlan: 3582

xthuijs · ‎12-02-2013

hey marks,

Either one of the 2 things can happen;

(1)

if the AC goes down then the mac will get flushed right away, results in flooding until we relearn it.

I would reckon that if the Host-C brings down the interface or switches, that the 9k side sees that AC down also.

This is necessary also for the force of the HSRP switchover (object tracking).

(2)

the interface remains up, but the same mac is now seen on a different port, this will result in an internal Mac NOTIFY

that basically switches the mac from port X to port Y. This can makes fail mac security checks if configured so (mac move).

In the output you have below it shows that hte mac is learnt via the PW-id3. that is fine, either case it means that we know where to switch the mac to and should not see flooding.

How do you determine that flooding is still occurring?

regards

xander

Marks Maslovs · ‎12-04-2013

Hi Alex,

Well, hsrp switcover is another concern... The problem with hsrp is that RNC (host C) decides which interface will be active and there is no way ASR can make to switcover cards on the RNC, if only by bring int. down.

So basically, I can enable tracking on the phys. int., so priority would decrease in case of host C interface failure. The hsrp would switch to alternate asr untill the time when the 1st interface will come back to life, so the priority would again be recovered, and hsrp would switchback to 1st asr, but from the perspective of rnc, the active connection would remain unchanged..

Regarding flooding..

I am creating SPAN session for both int gig0/0/0/0. So I can see that the same packets are sent out of both interfaces

xthuijs · ‎12-05-2013

aha thanks for that extra rationale Marks, here are some things to consider;

so this design really doesnt provide that much redundancy, there is little that we are protecting ourselves from except for a true cable failure between host-c and the connecting Agg node.

if both sides dont agree on going down, then there is potential blakcholing possible, unless we put some keepalive mechism in place that can detect this issue and adjust routing/switching accordingly.

sO am I thinking a few options here. We created DAGR for this particular design, but the customer we did this for, in the end realized that the complexity of implementation did not weigh against the "benefits" (mind the quotes) of such a design.

My proposal is this, how about, and if possible, can the RNC/Host-C create a virtual interface. in the virtual interface we route that ECMP down 2 uplinks. If the RNC declares one link down, that path is no longer used, ok that sovles upstream.

Downstream we can use IPSLA to measure the performance and availability of the path/link

we can use then loadblaancing and routing to, when available, use both links, or when one dies, immediately reroute to the other link.

This will be as fast as eg routing protocol hello/dead detection or even BFD can be used depending on RNC capabilities.

what you think about that?

regards

xander

Marks Maslovs · ‎12-09-2013

Sorry, couldn't contact You earlier.

And thank You very much for responsiveness, I really appriciate that.

Yes, You are completely right, the only thing that is protected it is pure physical cable failure. And I should add - not even all physical failures are protected, for example - unidirectional failures.

Unfortunately we were the witnesses of such failure. So after that it was decided that some sort of the protection must be implemented.

And that's where the tricky part starts. The thing is that RNC knows OSPF, it also is capable of creating BFD session (both singlehop and multihop). But as we have 1st generation line cards on the asr, that means, that BFD over BVI is not supported. We could use OSPF, but in that case the whole logic in connection between ASR and RNC would change - both links from asr's to rnc would have to be active. From the perspective of RNC engineers, this solution is not welcomed. First of all because, next year/ in 6 months) 2nd generation cards will be purchased. Secondly, changing rnc-asr conenction would mean changing of radio network (base station transport, O&M and control plane), basically the whole 3G mobile network.

So, regarding Your suggestions. I am very interested in DAGR solution. It sounds like very suitable for us.

I have talked with RNC engineer, he says that unfortunately it is impossible to create virtual int, though it is possible to create both cards active, working in loadbalancing mode, but as I said, 2 active cards are not welcomed.

I have thought about running ipsla from asr to rnc, but I don't see any sence in that, because in case of unidirectional failure ipsla couldn't switch rnc cards, I mean, ipsla is not capable of shutting down interface... (or am I wrong ?)

Actually, from my perspective, it would be great to get rid of the bridge-domain and BVI interfaces, and trying to find solution, which would provide protection, fast switchover and simplicity. Fast switchover is needed, because of voice traffic running asr-rnc links. And switching from bvi int to phys ir bundles, would give us opportunity to implement bfd right away.

xthuijs · ‎12-09-2013

hey marks, hey not a problem! this is an interesting situation you have going on

so on the DAGR: check this out:

http://www.cisco.com/en/US/docs/routers/asr9000/software/asr9k_r4.2/addr_serv/configuration/guide/b_ipaddr_cg42a9k_chapter_010.html#task_1134480

if you like more info on that, I may be able to dig up from somewhere. note that we did this back in the 37x days, it doesnt see a lot of interest, so you may only be the 2nd person actually using it to be honest.

As for IPSLA, yeah the detection on itself is like yeah nice, but then you can instruct an EEM script to capture or use that ipsla event as a trigger to do something, eg to force failover or something. So in that regard it becomes quite nice.

is it subsecond failover, no, but it is automated and reliable.

I think the ECMP/active-active mode is something pursuable also for you, as long as the total traffic doesnt exceed the single link speed, you shoudl be fine.

BFD is definitely usable for fast detection but it has some "restrictions" on the rnc side and as you mention the bvi with bfd.

any case have a look at the "object tracking with ipsla" support forums article and the dagr solution, maybe they are accomodating your scenario.

cheers!

xander

Marks Maslovs · ‎12-10-2013

Thanks, Alex!

hm, regarding DAGR.. As I anderstand, in case of using DAGR, the existing configuration (bridge-domain & BVI int) should be removed.. ?

Just, checked the EEM script. That's really cool stuff! That really gives some additional space for configuration capabilities. The only thing that concerns me is the reaction time of this script. I guess, anyway, tests will be needed to verify..

p.s. regarding bfd. Actually I was telling, that asr has restrictions using bfd over bvi )) (we have trident-based line cards)