MPLS MP-IBGP Internet Issue

Unanswered Question
Aug 6th, 2008

We are connecting three sites using MPLS Layer3 VPN technology. We are using service provider network for MPLS (core) connectivity. All of our 3 sites are connecting over MP-IBGP and then forwarding Internet traffic to one of PE, which is forwarding internet traffic to firewall and firewall is doing NAT for the CE routes.

We have strange problem, all sites are having stable connectivity while doing CE to CE or Site to Site ping, but external core connectivity is not stable, there is nothing to wrong with basic configuration because all the sites are working fine as for as local (internal) are internet (external) concerns, so PE is leaking routes to firewall and firewall is doing NAT the way it should. But sometimes some of the sites just got stuck at Internet PE the one leaking routes to firewall from customer VRF, and it starts working again after 1 or 2 hours 

We don't have control over service provider network, how do you guys troubleshoot such issues when you don't have access to core and traffic between sites is fine but external traffic is fluctuating. Here is the logical and physical topology…

VRF-A VRF-A

CE----PE---Core------PE----CE

|

|

Internet PE

VRF-A and their own

|

firewall nat to internet

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Thu, 08/07/2008 - 08:45

Hello Masood,

are you receiving an EBGP default route on your CE routers at the other sites ?

May you reach the Internet PE interface from your CE routers ?

May you ask your provider if the FW is made by a pair of FW in failover ?

I wonder if the 1 to 2 hours could be an ARP cache issue when the FWs make a switchover the new FW will keep the same IP address but will have a different MAC address.

When the old entry times out the internet PE will make a new ARP request and will learn the correct MAC address of the active FW.

You say

some of the sites just got stuck at Internet PE

so the problem is only of some sites not all of them ?

If so the explanation could be different from above

May you provide some more details ?

Hope to help

Giuseppe

jahilnt10 Thu, 08/07/2008 - 11:51

Hi Giuseppe,

1. are you receiving an EBGP default route on your CE routers at the other sites ?

Yes, traffic is reaching to internet destinations all the way. The only issue is with Internet PE.

May you reach the Internet PE interface from your CE routers ?

Yea, All time.

May you ask your provider if the FW is made by a pair of FW in failover ?

Yes it is, but I don't think the issue is with MAC because some of the site works fine while some does not. If there is ARP issue between PE and Firewall then PE should not forward anything to Firewall.

so the problem is only of some sites not all of them ?

All sites are working fine, but sometimes some of the sites just stop working for an hour....or so

Actually the problem is between Internet PE and Firewall, PE is having .1Q subz for multiple VRFs, and the traffic is being forwarded over .1Q trunk to a switch, Firewall is getting vlan tagged traffic from that switch over a .1Q trunk. The switch is interconnecting media between firewall and Internet PE.

Whenever there is problem accessing internet, the trace stops at Internet PE router.

We do have multi-path bgp and load balancing from MPLS core service provider.

Giuseppe Larosa Thu, 08/07/2008 - 12:05

Hello Masood,

an ARP issue is not present otherwise all sites would be affected.

However, one or two hours is a very long time that makes me think to some timers may be NAT timers.

I would suggest that some NAT resources could be exausted on the FW and then after some time they are freed and the troubled sites can again go to the internet.

All your traffic from all sites exit the Internet PE with the same 802.1Q tag, that of your VRF, so there cannot be an issue with some vlans on the trunk link.

If your provider is using a multicontext FWSM it could provide more resources to your context by modifying contexts configuration.

Hope to help

Giuseppe

Actions

This Discussion