Would really appreciate if anyone could shed some light on a BGP case that I have observed. We have a client that is multihomed to two upstreams via BGP: through us and through another one. From a certain looking-glass site, 'show ip bgp x.x.x.x' shows us to be the BEST path, but an actual 'traceroute' traverses through their other upstream's network.
Is this normal with BGP? If so, when and/or why does this happen? If not, can anyone suggest a way to correct this? We are only providing transit services to our client, and the only way that they can force traffic to pass through us is by shutting down their bgp session with their other upstream.
Need your help badly :-(
Which end was the trace done from? Routing is a two way process. Just because a site on the outside sees a best path to the client through you doesnt mean the client sees you as the best path to that site.
If the trace was from the LG site what are they doing to see that you are the best path? Where does the trace turn the wrong way?
Tnx to your reply. If there'd be any issue about the IGP cost, then how come the looking-glass sees us as the best path? The looking-glass that I used was the oregon-ix route server, which you can telnet into and allows you to enter cisco commands. The 'show ip bgp x.x.x.x' result says that we are indeed the best path (Origin IGP, metric 6, localpref 100, valid, external, best).
sh ip bgp does not show the route table that is being used, only the bgp info. Look at the route table.
Tnx to your reply. I think I got your point. I thought that the bgp info absolutely tells us that the best path that bgp sees is indeed the 'best route' , as if the actual route itself was considered to determine the best path. I should later be making several tests to prove this.
On the other hand, what would then be the significance of the bgp info stating the best path? Should it be considered a suggestion for us to make sure that the actual routes adhere to it, and make routing modifications as necessary?
The best bgp path may also be the path in the route table. I just wanted to point out the difference between looking at the data that "can" become routes and looking at the routes themselves. If there are multiple routing processes they all build databases and select best paths but where they overlap the routing process with the best AD gets its route put in the route table.
In you case I'm a little confused about how far (hops) this site is and why you would say they are pointing to one of your ingress points. Seems like they would point to a next hop to reach year address space.
If the path directly to the customer from your network is straight to the customer, and not out your upstream, and if the bestpath in the looking glass shows your path as the best path, there's not much more you can control. The one critical question you've not answered is where the traceroute was done from? The customer site? The looking glass site?
If it's from the looking glass site, there are any number of possible explainations. For instance, are you seeing more than one route in the looking glass results? Have you tried other looking glasses? Look at the entire as path the looking glass shows; any of these AS' could be redirecting the traffic from the path towards you to the path towards the other isp.
By the way, is it possible that the other ISP is advertising a longer prefix than the one you're advertising? Are you aggregating this address space? Are you looking at the actual customer prefix at the looking glass, or at some aggregate?
It might be better if you just gave us the output of the local show ip route and show ip bgp for this customer's prefix (?). :-)
Thanks for the added info.
The traceroute was originated from the looking-glass site and we see only a single route at a time, and that what really confuses us, because the route is know via a bgp advertisement which should mean to be derived from the designated best path.
We've tried other looking-glasses and the results were ok as expected. Our client reported that they have observed the issue from a few sites in the West Coast and Canada but have provided only the oregon-ix route server as a reference.
I understand that ingress traffic is not only dependent on BGP-selected best-paths, but is in fact much dependent on the actual routes in effect. This may take place anywhere among the paths/ASes traversed, not necessarily in the AS from which the trace is sourced. However, I still hope to hear suggestions about how we can somehow avoid these 'inconsistencies', if it's the right term to use.
I'm yet to verify from our client if the other ISP advertises a longer prefix than what we advertise (we advertise at /24). Will there be any effect of this difference in prefix-length in the routing table aside from being one of the BGP best-path selection criteria? Need your advise. Btw, we're looking at the actual customer prefix at the LG, not at the aggregate level.
I would want to provide the actual output of the bgp and routing info for this customer's prefix, but maybe not in a public forum. :-)
Okay, so it's only from this one route view server that you are seeing the other ISP as the best path, which narrows the problem down somewaht, and some west coast sites. If you're looking at the actual prefix, and not seeing an aggregate in that route view's table, it's not an issue with aggregation. Otherwise, you'd see an aggregate instead of the actual customer route.
If you want email me the information you have, and I'll poke around to see what I can figure out.