cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1103
Views
15
Helpful
20
Replies

HEADS UP -- BGP Issue

Mohamed Sobair
Level 7
Level 7

Hi,

Here is the scenario,

ISP has 2 uplinks toward the upstream provider, The first link represent the Internet Gateway link (Main Internet GW), the Second link represent the National link for (National exchange).

The ISP has its own registered public AS number from RIPE. The upstream National provider has a private AS as its peering address with the ISP and its being removed when announcing thier Network to the National peers. The Upstream National provider also has its own Internet link (This link is being used for thier own usage not used to connect the ISP to the outside world). The ISP uses the second upstream Internet GW to announce it network and connect to the Internet.

The Problem:

The Upstream National provider has announces the ISP Network to the Internet by mistake (The link that is used for thier own purpose), The public AS of the ISP has been removed when thier Network announced to the Internet.

Now, The Outside world have 2 possible paths to reach the ISP Network, however, One of the paths is annoncing the Network without its registered Public AS.

The Second link (Main Internet GW link) is announcing the ISP Network with its associated Public registered AS number.

What happened is that the ISP Network has been down for a while, why?? The Internet has prefered the National link path to reach ISP Network because its the shortest AS path, although thier AS is not announced.

MY question: Since a border Router (BGP router) uses AS attributes to reach particulat destination , How is it going to ssure that a particular Network belong to a particular registere AS number? why the Second link is not considered although its announcing thier Public AS?

Regrds,

Mohmed

20 Replies 20

Giuseppe Larosa
Hall of Fame
Hall of Fame

Hello Mohamed,

>> How is it going to ssure that a particular Network belong to a particular registere AS number?

the BGP protocol by itself doesn't provide these consistency checks.

Some routing policies can be implemented in order for example to verify what a peer advertises in a peering relationship.

For example complex routing policies can be expressed and verified with RPSL.

see

http://ftp.iasi.roedu.net/mirrors/ftp.isc.org/pubs/pres/TorIX/2004/08/torix-rpsl.pdf

>> The Internet has prefered the National link path to reach ISP Network because its the shortest AS path, although thier AS is not announced.

a public IP network should be advertised only the legitimate AS number on this is based BGP.

the error made internet to think a prefix was associated to another AS number that of the upstream provider.

the out of service might have been caused by some routing to null0 on upstream provider or some more modern security mechanism.

Hope to help

Giuseppe

Mohamed,

Routers do not validate a prefix against its corresponding AS. All they need is a path to the destination. They consider the AS_PATH length in the BGP best path selection process, but do not cross-check with the registries. All that routers need is to find a next-hop claiming to be able to reach the destination network.

Validation happens manually by network operators or with the aid of automated tools when the as-path and prefix filters are put in place in the routers for a particular BGP session. Still, most of this procedure relies on human factor and the sense of responsibility ISP engineers have. You just can't stop the upstream national provider from making a mistake in this procedure (you do have better control over your downstream clients though). Actually, it is the upstream provider's job to check you.

Anyone can advertise a network assigned to somebody else, and one thing that stops them from creating chaos is the shame they will feel when they will get caught (they will). Another discouraging factor is the potential break of their own connectivity. In your case the national provider provided transit to the ISP, which is more than the ISP payed for I suppose and maybe more traffic that the national provider could handle (or maybe could handle it, but it is still free).

Note, that this type of flexibility allows various policies and agreements to be made, as is the case of a customer that has been assigned a network but no AS by the registry. In this case, the ISP AS claims towards the Internet that the customer block has been assigned to him (advertises the customer network to the Internet as one of its own blocks).

An indication of possible bogus announcements can be found here:

http://www.cidr-report.org/as2.0/

Kind Regards,

M.

p.s. I suppose that the ISP was down for a while because the national link could not handle the collective incoming traffic (national + internet). Your outgoing traffic was leaving as normal, but your return traffic got congested in the national link. Or traffic got blackholed in the national provider network, I don't know.

Mohamed Sobair
Level 7
Level 7

Hi Maria,

I know how routers preference to reach their Network destination? My question was how the ISP is validated with his registered AS Number.

The second question? If this is the case, why then the ISP was down? The outside world should prefer the shortest As-path which is the National provider, The National provider has already a link with the ISP , So the Network of the ISP is visible on the National provider Router!!! (No Sort of blackholing)

Hope I have clarified my question

Mohamed

Mohamed,

The ISP has an AS number and some blocks of its own (assigned blocks is the correct phrase because the IPs are a public resource, but I might use the word owner from now on). It also has some customer ASs and blocks associated with it. All this information is kept in a registry such as RIPE together with a summary of the routing policy of each AS.

A new customer comes for ISP and has an AS and a block assigned. The manual procedure is for an ISP operator to query the RIPE whois database. Is the AS assigned to the customer? Is the block assigned to the customer? If yes, the BGP filters for the client are set. ISP router allows advertisements from client to pass through it only if they meet the specifications. Now that the client side is ready, the ISP notifies upstream that a new client is in place and passes through the ISP. Upstream updates its filters and routes pass through the upstream.

Back to the national provider. The national provider has a connection with its own upstream. If the upstream of the national provider is using as-path filters only (no prefix-filters for particular networks), any network advertised with originating AS that of the national provider will leak into the internet (because the national provider advertises its own blocks and upstream allows blocks with originating AS that of the national provider), which is probably what happened with the ISP routes. The private AS was removed before sending the routes towards the national provider's IGW and routes where seen by upstream as originating from national provider.

The network of the ISP would be visible through the national provider. Visibility does not guarantee reachability. Even if you did not see your link becoming congested, this does not mean that the national provider could handle the ISP collective traffic. The national provider was seen by the internet as the closest exit to the ISP AS. It also received traffic from the national peers. Some of the national provider's links or routers could have become overloaded with the traffic and could have caused big issues within the national provider's network. This is what I meant by blackholing. Blackholing doesn't mean sending traffic to null0 only, but anywhere it is unable to pass through. Also, the fact that the national provider made a mistake doesn't mean it didn't make more mistakes. In fact, it suggests it could have made another one and anything could be expected.

Kind Regards,

M.

p.s. this post has been edited to clarify some points

Also, don't forget that after the national provider made the mistake, they would be trying to resolve the issue, and during this process they could be trying things and cause some instability to the visibility of your routes. Any other side-effects by overloaded routers/links during this time could contribute to the visibility of your routes from the Internet without any human intervention.

Mohamed Sobair
Level 7
Level 7

Hi Maria,

looking at the following pargraphs:

----((A new customer comes for ISP and has an AS and a block assigned. The manual procedure is for an ISP operator to query the RIPE whois database. Is the AS assigned to the customer? Is the block assigned to the customer? If yes, the BGP filters for the client are set. ISP router allows advertisements from client to pass through it only if they meet the specifications. Now that the client side is ready, the ISP notifies upstream that a new client is in place and passes through the ISP. Upstream updates its filters and routes pass through the upstream))----

I understood the query process, but doesnt that applicable for the downstream traffic, I mean if the outside world query RIP database , they should use the valid path which is IGW path because their AS is associated with thier Owned public AS.

Regarding the National Provider, the traffic will certainly be dropped at some point or overhelmed because of Congestion , because the actuall case happened for most of the ISPs, huge amount of traffic prefered thier path and possibly bring the ISPs & thier Network down due to Overhelming/performance, therfore , couldnt handle it.

HTH

Mohamed

The right procedure is for the upstream to check, but not everyone checks. This is a practical issue. There is a trust relationship at some point. Mistakes can happen and routes can be propagated because of this trust.

I know I said that the upstream is notified to update its filters. However, as you saw in the case of the national provider, the national provider did not actually intend to do this and I suppose did not inform its own upstream. No registry lookup occured and no request was sent to upstream. The routes just leaked.

This upstream was already trusting the networks sent to it with originating AS that of the national provider. The as-path filter of the upstream did not need an update for the routes to pass, because national provider had arranged the filter to be there when it first announced its own networks to the upstream.

Had the upstream been using prefix-filters as well, the routes could not leak unintentionally, because they would stop at the upstream. Some big providers might not use prefix filtering because they consider it a burden, but this can lead to cases like the one you described.

Have a look at the following discussion to see more on the issue of prefix vs as-path filtering:

http://forum.cisco.com/eforum/servlet/NetProf?page=netprof&forum=Service%20Providers&topic=MPLS&topicID=.ee8558c&fromOutline=true&CommCmd=MB%3Fcmd%3Ddisplay_location%26location%3D.2cc27eb1/1

Maria, as the usual, is spot on.

By the way, look at what happened last year:

http://www.potaroo.net/ispcol/2008-03/routehack.html

"Using a blackhole route within AS17557, Pakistan Telecom, is entirely a local matter, but allowing the route to leak in the inter-AS routing fabric is the first serious issue. Secondly, AS3491, PCCW, did not appear to have in place the necessary route filters on its links to AS17557 to prevent it from learning this unauthorized route. And lastly, once the route entered into the transit core of the Internet the general framework of mutual trust between transit network operators ensured that the false network route was efficiently propagated across the rest of the Internet." ... "We've tried route filters, Internet Route Registries, bogon filters and automated registry lookup systems to attempt to match some trusted external source of information about who has what rights of use of which addresses and AS numbers with the information contained in the routing system."

In this type of cases (and in others) you need more than one person (PCCW cooperated nicely as did the upstream of the national provider in your case) to accomplish the unintended task. That is the reason I don't like it when I hear that an AS "single-handedly" caused an issue (you remember what happened the other day with the long AS paths: http://www.renesys.com/blog/2009/02/the-flap-heard-around-the-worl.shtml).

Do you have a link describing your scenario or did it not get famous enough? I guess I like reading this type of stuff :-)

Paolo, thanks for "setting the record straight" ;-)

Maria, I was just recognizing your ability of looking at things in perspective, rather than to technicalities only.

In my opinion, a technician knows "how", an engineer knows "why", an architect knows "better" :)

Now for one more of my rants... I had access to peering administration first in 1995. It didn't took long for me to realize that most of the supposed protections weren't in place at all, or ineffective, or circumventable. It looks like things haven't changed much.

Mohamed Sobair
Level 7
Level 7

Hello Maria,

You deserve full rate, Thanks for your time and clarification.

Regards,

Mohamed

Mohamed,

I was thinking about this case again and I remembered something that seemed odd to me when I read your original post, but we got into analyzing the failure and I forgot to mention it.

You said that the ISP uses a private AS for its peering with the national provider. I do not understand why this is so and people can do all kinds of hacks with BGP (perhaps the national provider wants to simplify its internal procedures or anything like that). Still, this is a weak point in the design. Had the ISP been using its own AS in the peering with the national provider, the leak would have been stopped even by the as-path filter of the national provider's upstream. I am not saying that mistakes cannot still happen with this design, but it's somewhat harder. A simple removal of the private AS towards the national provider's upstream deprives the routes of its origin AS and makes them look like coming from the national provider. This needs more work to be accomplished when ISP is using its own AS in the peering with the national provider. This is technically possible, but still I don't know the reason the private AS option for the peering has been chosen in the first place.

I only mention this in case you need to explore the possibility of a less error prone alternative to your existing design. Some people made mistakes for the failure to happen, but still I believe any possibility that helps avoid such issues in the future could be explored.

Kind Regards,

Maria

Mohamed Sobair
Level 7
Level 7

Hi Maria,

Frankly, I've no clue about the Network Design OR Why the National is using private AS.

I came to know this problem from a friend of mine, and the actual case happened for most of the ISPs , since Thier National Backbone connected to the same National Provider.

What I know, the ISP uses a private-as as its peering Address with the National provider (The Private AS represents the National Provider AS), and certainly the router whichh terminates all ISPs has a connectivity with thier Main Internet Gateway router (Cause The Outage/failure implies this).

IF this is the Situation, then I totally agree with you that its poor design, cause it shouldnt have any sort of connectivity to thier Main Internet Gateway anyways.

Regards,

Mohamed

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: