Attention: The Community will be in read-only mode on 12/14/2017 from 12:00 am pacific to 11:30 am.
During this time you will only be able to see content. Other interactions such as posting, replying to questions, or marking content as helpful will be disabled for few hours.
We apologize for the inconvenience while we perform important updates to the Community.
Here is the scenario,
ISP has 2 uplinks toward the upstream provider, The first link represent the Internet Gateway link (Main Internet GW), the Second link represent the National link for (National exchange).
The ISP has its own registered public AS number from RIPE. The upstream National provider has a private AS as its peering address with the ISP and its being removed when announcing thier Network to the National peers. The Upstream National provider also has its own Internet link (This link is being used for thier own usage not used to connect the ISP to the outside world). The ISP uses the second upstream Internet GW to announce it network and connect to the Internet.
The Upstream National provider has announces the ISP Network to the Internet by mistake (The link that is used for thier own purpose), The public AS of the ISP has been removed when thier Network announced to the Internet.
Now, The Outside world have 2 possible paths to reach the ISP Network, however, One of the paths is annoncing the Network without its registered Public AS.
The Second link (Main Internet GW link) is announcing the ISP Network with its associated Public registered AS number.
What happened is that the ISP Network has been down for a while, why?? The Internet has prefered the National link path to reach ISP Network because its the shortest AS path, although thier AS is not announced.
MY question: Since a border Router (BGP router) uses AS attributes to reach particulat destination , How is it going to ssure that a particular Network belong to a particular registere AS number? why the Second link is not considered although its announcing thier Public AS?
>> How is it going to ssure that a particular Network belong to a particular registere AS number?
the BGP protocol by itself doesn't provide these consistency checks.
Some routing policies can be implemented in order for example to verify what a peer advertises in a peering relationship.
For example complex routing policies can be expressed and verified with RPSL.
>> The Internet has prefered the National link path to reach ISP Network because its the shortest AS path, although thier AS is not announced.
a public IP network should be advertised only the legitimate AS number on this is based BGP.
the error made internet to think a prefix was associated to another AS number that of the upstream provider.
the out of service might have been caused by some routing to null0 on upstream provider or some more modern security mechanism.
Hope to help
Routers do not validate a prefix against its corresponding AS. All they need is a path to the destination. They consider the AS_PATH length in the BGP best path selection process, but do not cross-check with the registries. All that routers need is to find a next-hop claiming to be able to reach the destination network.
Validation happens manually by network operators or with the aid of automated tools when the as-path and prefix filters are put in place in the routers for a particular BGP session. Still, most of this procedure relies on human factor and the sense of responsibility ISP engineers have. You just can't stop the upstream national provider from making a mistake in this procedure (you do have better control over your downstream clients though). Actually, it is the upstream provider's job to check you.
Anyone can advertise a network assigned to somebody else, and one thing that stops them from creating chaos is the shame they will feel when they will get caught (they will). Another discouraging factor is the potential break of their own connectivity. In your case the national provider provided transit to the ISP, which is more than the ISP payed for I suppose and maybe more traffic that the national provider could handle (or maybe could handle it, but it is still free).
Note, that this type of flexibility allows various policies and agreements to be made, as is the case of a customer that has been assigned a network but no AS by the registry. In this case, the ISP AS claims towards the Internet that the customer block has been assigned to him (advertises the customer network to the Internet as one of its own blocks).
An indication of possible bogus announcements can be found here:
p.s. I suppose that the ISP was down for a while because the national link could not handle the collective incoming traffic (national + internet). Your outgoing traffic was leaving as normal, but your return traffic got congested in the national link. Or traffic got blackholed in the national provider network, I don't know.
I know how routers preference to reach their Network destination? My question was how the ISP is validated with his registered AS Number.
The second question? If this is the case, why then the ISP was down? The outside world should prefer the shortest As-path which is the National provider, The National provider has already a link with the ISP , So the Network of the ISP is visible on the National provider Router!!! (No Sort of blackholing)
Hope I have clarified my question
The ISP has an AS number and some blocks of its own (assigned blocks is the correct phrase because the IPs are a public resource, but I might use the word owner from now on). It also has some customer ASs and blocks associated with it. All this information is kept in a registry such as RIPE together with a summary of the routing policy of each AS.
A new customer comes for ISP and has an AS and a block assigned. The manual procedure is for an ISP operator to query the RIPE whois database. Is the AS assigned to the customer? Is the block assigned to the customer? If yes, the BGP filters for the client are set. ISP router allows advertisements from client to pass through it only if they meet the specifications. Now that the client side is ready, the ISP notifies upstream that a new client is in place and passes through the ISP. Upstream updates its filters and routes pass through the upstream.
Back to the national provider. The national provider has a connection with its own upstream. If the upstream of the national provider is using as-path filters only (no prefix-filters for particular networks), any network advertised with originating AS that of the national provider will leak into the internet (because the national provider advertises its own blocks and upstream allows blocks with originating AS that of the national provider), which is probably what happened with the ISP routes. The private AS was removed before sending the routes towards the national provider's IGW and routes where seen by upstream as originating from national provider.
The network of the ISP would be visible through the national provider. Visibility does not guarantee reachability. Even if you did not see your link becoming congested, this does not mean that the national provider could handle the ISP collective traffic. The national provider was seen by the internet as the closest exit to the ISP AS. It also received traffic from the national peers. Some of the national provider's links or routers could have become overloaded with the traffic and could have caused big issues within the national provider's network. This is what I meant by blackholing. Blackholing doesn't mean sending traffic to null0 only, but anywhere it is unable to pass through. Also, the fact that the national provider made a mistake doesn't mean it didn't make more mistakes. In fact, it suggests it could have made another one and anything could be expected.
p.s. this post has been edited to clarify some points
Also, don't forget that after the national provider made the mistake, they would be trying to resolve the issue, and during this process they could be trying things and cause some instability to the visibility of your routes. Any other side-effects by overloaded routers/links during this time could contribute to the visibility of your routes from the Internet without any human intervention.
looking at the following pargraphs:
----((A new customer comes for ISP and has an AS and a block assigned. The manual procedure is for an ISP operator to query the RIPE whois database. Is the AS assigned to the customer? Is the block assigned to the customer? If yes, the BGP filters for the client are set. ISP router allows advertisements from client to pass through it only if they meet the specifications. Now that the client side is ready, the ISP notifies upstream that a new client is in place and passes through the ISP. Upstream updates its filters and routes pass through the upstream))----
I understood the query process, but doesnt that applicable for the downstream traffic, I mean if the outside world query RIP database , they should use the valid path which is IGW path because their AS is associated with thier Owned public AS.
Regarding the National Provider, the traffic will certainly be dropped at some point or overhelmed because of Congestion , because the actuall case happened for most of the ISPs, huge amount of traffic prefered thier path and possibly bring the ISPs & thier Network down due to Overhelming/performance, therfore , couldnt handle it.
The right procedure is for the upstream to check, but not everyone checks. This is a practical issue. There is a trust relationship at some point. Mistakes can happen and routes can be propagated because of this trust.
I know I said that the upstream is notified to update its filters. However, as you saw in the case of the national provider, the national provider did not actually intend to do this and I suppose did not inform its own upstream. No registry lookup occured and no request was sent to upstream. The routes just leaked.
This upstream was already trusting the networks sent to it with originating AS that of the national provider. The as-path filter of the upstream did not need an update for the routes to pass, because national provider had arranged the filter to be there when it first announced its own networks to the upstream.
Had the upstream been using prefix-filters as well, the routes could not leak unintentionally, because they would stop at the upstream. Some big providers might not use prefix filtering because they consider it a burden, but this can lead to cases like the one you described.
Have a look at the following discussion to see more on the issue of prefix vs as-path filtering:
By the way, look at what happened last year:
"Using a blackhole route within AS17557, Pakistan Telecom, is entirely a local matter, but allowing the route to leak in the inter-AS routing fabric is the first serious issue. Secondly, AS3491, PCCW, did not appear to have in place the necessary route filters on its links to AS17557 to prevent it from learning this unauthorized route. And lastly, once the route entered into the transit core of the Internet the general framework of mutual trust between transit network operators ensured that the false network route was efficiently propagated across the rest of the Internet." ... "We've tried route filters, Internet Route Registries, bogon filters and automated registry lookup systems to attempt to match some trusted external source of information about who has what rights of use of which addresses and AS numbers with the information contained in the routing system."
In this type of cases (and in others) you need more than one person (PCCW cooperated nicely as did the upstream of the national provider in your case) to accomplish the unintended task. That is the reason I don't like it when I hear that an AS "single-handedly" caused an issue (you remember what happened the other day with the long AS paths: http://www.renesys.com/blog/2009/02/the-flap-heard-around-the-worl.shtml).
Do you have a link describing your scenario or did it not get famous enough? I guess I like reading this type of stuff :-)
Maria, I was just recognizing your ability of looking at things in perspective, rather than to technicalities only.
In my opinion, a technician knows "how", an engineer knows "why", an architect knows "better" :)
Now for one more of my rants... I had access to peering administration first in 1995. It didn't took long for me to realize that most of the supposed protections weren't in place at all, or ineffective, or circumventable. It looks like things haven't changed much.
I was thinking about this case again and I remembered something that seemed odd to me when I read your original post, but we got into analyzing the failure and I forgot to mention it.
You said that the ISP uses a private AS for its peering with the national provider. I do not understand why this is so and people can do all kinds of hacks with BGP (perhaps the national provider wants to simplify its internal procedures or anything like that). Still, this is a weak point in the design. Had the ISP been using its own AS in the peering with the national provider, the leak would have been stopped even by the as-path filter of the national provider's upstream. I am not saying that mistakes cannot still happen with this design, but it's somewhat harder. A simple removal of the private AS towards the national provider's upstream deprives the routes of its origin AS and makes them look like coming from the national provider. This needs more work to be accomplished when ISP is using its own AS in the peering with the national provider. This is technically possible, but still I don't know the reason the private AS option for the peering has been chosen in the first place.
I only mention this in case you need to explore the possibility of a less error prone alternative to your existing design. Some people made mistakes for the failure to happen, but still I believe any possibility that helps avoid such issues in the future could be explored.
Frankly, I've no clue about the Network Design OR Why the National is using private AS.
I came to know this problem from a friend of mine, and the actual case happened for most of the ISPs , since Thier National Backbone connected to the same National Provider.
What I know, the ISP uses a private-as as its peering Address with the National provider (The Private AS represents the National Provider AS), and certainly the router whichh terminates all ISPs has a connectivity with thier Main Internet Gateway router (Cause The Outage/failure implies this).
IF this is the Situation, then I totally agree with you that its poor design, cause it shouldnt have any sort of connectivity to thier Main Internet Gateway anyways.
OK, then. I just was not sure at which side you were standing. Since its other people's job to decide this, we cannot do much I suppose. That was a nice case you posted. Thanks. I totally enjoyed it :-)
Edit: A typical case for a provider is this: Customer initially requires national connectivity only. At some point customer changes its mind and asks for international connectivity. ISP goes to router(s) connected to its IGW and allows customer routes to pass to upstream.
In your scenario, a case like this could cause the meltdown. ISP removes private AS towards IGW to accomplish the announcement for particular customer and all customers are affected. This is a wild guess, but certainly possible. This is not just poor design. It is also not flexible enough to handle various common customer scenarios.
The ISP has one physical link, but 2 seperate PVCs, one for the National & one for the IGW link. So the National should be isolated from IGW!!
I think we have a misunderstanding. I used the word "ISP" in my previous post, while I was referring to the national provider. Sorry about that. I am referring to the design of the national provider.
Edit: Ignore that post. I re-post corrected.
A typical case for a provider is this: Customer initially requires national connectivity only. At some point customer changes its mind and asks for international connectivity. National provider goes to router(s) connected to its IGW and allows customer routes to pass to upstream.
In your scenario, a case like this could cause the meltdown. National provider removes private AS towards its IGW to accomplish the announcement for particular customer and all customers are affected. This is a wild guess, but certainly possible. This is not just poor design. It is also not flexible enough to handle various common customer scenarios without being error prone.
Only the thing that I couldn't understand that how the RIPE database checked and who checking it I mean which router in the internet , I believe that for every a single prefix that you announce it to the internet you have to create a route of it in the RIPE database ( webupdates) and tied it with our AS number but how is the mechanism to be checked to insure the each prefix announces from its corresponding AS to avoid what happen in Pakistan telecom with YouTube website last year
the problem is actually this:
the BGP protocol by itself performs some checks but a router running BGP doesn't verify with RIPE or other RIR if an advertisement is correct or not and if a specified prefix is advertised with an origin AS that is the legitimate one or not.
A BGP router by itself just checks to see if the own AS is not present in the AS path string ( loop avoidance) and if it can reach the BGP next hop.
Additional sanity checks should be performed:
it is easier to control near the edge near the leaf ASes the ones that are just multihomed but doesn't provide transit service to any other AS.
So the provider of a customer can and should verify to be receiving only some prefixes that are the ones associated to that customer.
But then that ISP needs to interconnect with others.
Hope to help
Guiseppe is right.The basic question here is why doesn't everydody check the RIR databases. Answers have been provided previously. I will try to sum up.
1. In the transit core of the internet that big providers interconnect there is mutual trust. This is a practical issue. There are so many networks and ASs and their policies and connectivity change often. A big provider cannot check everybody every single moment even with automated systems and adjust accordingly.
2. We go to the leafs. Typically we have: customer AS connecting to ISP with that ISP in turn connecting to an upstream (bigger) provider. The ISP and the upstream have to check, but not all of them are using prefix-filters everywhere, because they consider it an administrative burden. PCCW was not using prefix-filters, so could not catch the illegitimate prefix.
3. Suppose that we have a perfect world and ISP and upstream are using tight as-path filters and prefix-filters. Even in this case, there are customer scenarios (especially failover ones) that cannot be verified against the RIR database and are up to leaf systems to decide. I will try to help you understand this third point with a couple of examples.
The first example would be the primary case in this thread you are reading. The ISP connects to national provider using private AS. The national provider advertises the ISP networks towards the national peers as being their own and they agreed with the ISP customer on this. They did not agree for those networks to go further to the internet, just to the national peers. If the national peers check the database, this scenario will not work. The national peers will say to the national provider: hey, those networks are assigned to the ISP, not you! I am not saying that the scenario cannot work in any other (even better) way, but sometimes various cost factors lead to weird setups.
And now another story that I was trying to forget, because it caused me humiliation in my country's national exchange (I actually remember my mistakes better than anything else :-). Customer comes and says they have a network for us to announce in national peers only. I check the network and I see its not theirs, but is rather part of a block of one of our national peers. I start arguing with the TAC. They contact customer and at the end TAC says: customer has arranged this with their primary provider (the national peer). Our announcement will be used in the national exchange only. Now, this is a weird setup, but since it won't go further to the internet, we could just do it and say to national peers that we have an exit point to somebody else's network. Assuming that this somebody else agreed. Customer said so, but that was a case of sales not talking to their engineers. When I sent the e-mail to the national peers, national peer that owned the block gave me a shock: This is one of our blocks! Don't do this!
A, the world is far less than perfect. This weird setup could work. They just did not accept it. They were right in general. We should not be doing things just because a customer says so. Still, it's hard to resist sometimes to customer requests. This makes possible for scenarios that cannot be checked against the databases (assuming that databases are current, which is another story).
p.s. The humiliation story had to do with the fact that not every customer has its own block assigned. This has to do with block assignment policies of RIRs. Anyone interested in IPv6? ;-)