Strange switching/routing decisions SUP2

Unanswered Question
Sep 22nd, 2008
User Badges:

Hi

I have two 6506s with SUP2 on them. There has been a problem there with maxing out BGP routes on both switches, which was resolved by quite aggressively filtering incoming updates from our transit. Now I have come across a rather strange phenomenon - switch SW2 (the picture is attached) has what looks like a valid IP routing table, but a number of networks get mls switched to the switch SW1. And I cannot do anything about it.

For example the 66.185.180.0/24 network:

Sw2#sh ip route 66.185.180.20

Routing entry for 66.185.180.0/24

Known via "bgp xxx", distance 20, metric 0

Tag 5400, type external

Last update from 166.49.129.9 2d20h ago

Routing Descriptor Blocks:

* 166.49.129.9, from 166.49.129.9, 2d20h ago

Route metric is 0, traffic share count is 1

AS Hops 5


Sw2#sh ip bgp 66.185.180.20

BGP routing table entry for 66.185.180.0/24, version 3701504

Paths: (1 available, best #1, table Default-IP-Routing-Table)

Advertised to non peer-group peers:

x.x.x.x x.x.x.x x.x.x.x .x.x.x

5400 3561 12182 18730

166.49.129.9 from 166.49.129.9 (166.49.166.227)

Origin IGP, localpref 201, valid, external, best

But:


Sw2 #sh mls cef exact-route x.x.x.x 66.185.180.20


Interface: Vl10, Next Hop: 80.68.34.197, Vlan: 10, Destination Mac: xxxxx


Any ideas?


PS Replacing the SUP card is not an issue.



Attachment: 
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 3 (1 ratings)
Loading.
Giuseppe Larosa Mon, 09/22/2008 - 12:10
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

you should provide more info to get better help.


First of all,

sh mls to see what RP(s) are seen on SW2


sh ip bgp sum to check the number of IP prefixes that are currently in the BGP table.


IOS version and


sh run | inc mls


if MLS does not consider SW1's MSFC as a valid RP this would lead to a problem.


A basic question: have you checked if any form of PBR is changing the next-hop based on the source x.x.x.x this could be an explanation for what you see.


do sh run int vlan X

see if there any ip policy command


Hope to help

Giuseppe


m-haddad Mon, 09/22/2008 - 13:29
User Badges:
  • Silver, 250 points or more

Hello Victor,


I don't see where is the problem. SW2 is preferring SW1 as the next hop to reach 66.185.180.0/24 subnet. What you have to check is why is the route prefered through SW1. Issue "show ip bgp" on SW2 and check what are the available BGP routes for the 66.185.180.0/24 subnet. Do the same on SW1 and then you can determine why SW2 is preferring SW1.


Hope this helps,


Giuseppe Larosa Mon, 09/22/2008 - 13:48
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Mohammad,

in his initial post Victor has already provided the output of sh ip bgp 66.185.180.0/24 and the best path is the eBGP path with a different BGP next-hop:


Sw2#sh ip bgp 66.185.180.20

BGP routing table entry for 66.185.180.0/24, version 3701504

Paths: (1 available, best #1, table Default-IP-Routing-Table)

Advertised to non peer-group peers:

x.x.x.x x.x.x.x x.x.x.x .x.x.x

5400 3561 12182 18730

166.49.129.9 from 166.49.129.9 (166.49.166.227)

>>Origin IGP, localpref 201, valid, external, best


So it looks like that it is not BGP the process that has installed the MLS entry



Hope to help

Giuseppe


m-haddad Mon, 09/22/2008 - 14:00
User Badges:
  • Silver, 250 points or more

Hello,


Yep you're right Giuseppe, I wanted to see the route from SW1 perspective as well. Victor, can you issue the command "show ip route 166.49.129.9" on each switch this will help you detect where is the routing loop.


Regards,


VictorAKur Wed, 09/24/2008 - 05:19
User Badges:

Well... 166.49.129.9 is a BGP peer of SW 2 and is directly connected to it (well, as directly as a BGP peer could be).


SW1 however does not know anything about this network range and will send any traffic destinned for 166.49.129.9 to the default gateway, which is a Transit provider peering with SW1.


But that is not the point anyway, as I have problems with a completely different network range.

VictorAKur Mon, 09/22/2008 - 13:50
User Badges:

Hm... no it doesn't help. In my original post there is a show ip route and sh ip bgp output from the SW2. Both point to 166.49.129.9 as the best route for the 66.185.180.20/24 network in my example. However as mls cef seems to point to sw1 for the same destination, this is where all traffic is sent at the moment. SW1 is a BGP peer of SW2 and receives the route to 66.185.180.20/24 from SW2. So as a result - SW2 mls switches traffic to SW1, but SW1 knows that the best path to that network is via SW2. We have a perfect loop...

I will collect some more info and post it in tomorrow.


PS :) Ah I see - i am a bit late with the reply :)

VictorAKur Tue, 09/23/2008 - 00:35
User Badges:

So more info then...


SW2#sh mls rp

ip multilayer switching is globally disabled

ipx multilayer switching is globally disabled

ipx mls inbound acl override is globally disabled

mls id is 000f.f723.5fc0

mls ip address 80.68.34.198

mls ip flow mask is unknown

mls ipx flow mask is unknown

number of domains configured for mls 0


***80.68.34.198 - is the IP of SW2


SW2#sh ip bgp sum

BGP router identifier x.x.x.x, local AS number xxxx

BGP table version is 4362003, main routing table version 4362003

198164 network entries and 215084 paths using 26964932 bytes of memory

83299 BGP path attribute entries using 5000280 bytes of memory

37583 BGP AS-PATH entries using 979964 bytes of memory

362 BGP community entries using 14610 bytes of memory

121171 BGP route-map cache entries using 1938736 bytes of memory

0 BGP filter-list cache entries using 0 bytes of memory

Dampening enabled. 430 history paths, 3 dampened paths

BGP activity 648405/5343834 prefixes, 2916396/2701312 paths, scan interval 60 secs


IOS Version 12.1(20)E3,

file c6sup22-psv-mz.121-20.E3.bin



VictorAKur Tue, 09/23/2008 - 02:13
User Badges:

More information still.


!!!!!!the show ip cef command on MSFC:


sw2#sh ip cef 66.185.180.20

66.185.180.0/24, version 3309783, epoch 2, cached adjacency 166.49.129.9

0 packets, 0 bytes

Flow: AS 0, mask 24

via 166.49.129.9, 0 dependencies, recursive

next hop 166.49.129.9, FastEthernet4/5 via 166.49.129.9/32

valid cached adjacency


!!!!!!Same command on the PFC:


sw2-sp#sh ip cef 66.185.180.20

66.185.180.0/24, version 3309557, epoch 2, cached adjacency 166.49.129.9

0 packets, 0 bytes

via 166.49.129.9, 0 dependencies, recursive

next hop 166.49.129.9, FastEthernet4/5 via 166.49.129.9/32

valid cached adjacency


!!!!!!So far so good - same next hop, everything is as it should be.


!!!!!!Now show mls cef command on PFC:


sw2-sp#sh mls cef 66.185.180.0

Index Prefix Mask Adjacency

24134 66.185.180.0 255.255.255.0 xxxx.xxxx.xxxx

140648 66.185.180.0 255.255.254.0 xxxx.xxxx.xxxx

182916 66.185.180.0 255.255.252.0 yyyy.yyyy.yyyy



!!!!!As you can see the first two entries for the /24 and /23 networks have their Adjacencies set to the xxxx.xxxx.xxxx MAC address and the third entry - / 22 is set to the yyyy.yyyy.yyyy MAC address.


yyyyy.yyyy.yyyy - is the MAC of the next hop in the show ip cef command - 166.49.129.9

xxxx.xxxx.xxxx - is the MAC of the SW1 on the diagram.


As this table is read from top to bottom (/32 to /0) and the first matching entry is executed, the entry pointing to the SW1 will always work, even though the routing and ip cef tables point in a different direction.


Corrr... Do I need help or what? :) Help! :)

Giuseppe Larosa Tue, 09/23/2008 - 02:26
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Victor,

this is strange the BGP prefix is 66.185.180.0/24 and not 66.185.180/21.

Also how are generated the /24 and /23 entries with next hop SW1.


You said that you have filtered prefixes to reduce the BGP table size, however from the sh ip bgp sum I still see a big number of IP prefixes:

198164 network entries


I wonder if they are still too much and making the CEF process to work badly.

I'm not sure where I read it but I think that 128000 prefixes can be the size of the CEF table with SUP2.


I can say that we had some problems with CEF entries on our DMZ switches that have SUP720 3BXL over it and we had to perform an IOS upgrade.


sh mls cef maximum-routes if supported can tell you how much ipv4 entries can be handled by CEF


see

http://www.cisco.com/en/US/docs/switches/lan/catalyst6500/ios/12.2ZY/command/reference/show3.html#wp2322601


I don't think it is supported on your release.


the MSFC can have enough memory to host 192,000 routes but the CEF table could be not able to do that.


Hope to help

Giuseppe


VictorAKur Tue, 09/23/2008 - 02:42
User Badges:

The limit on the SUP2 is 256000 IPV4 routes.


As the Internet routing table is something like 261000 at the moment, you can see that 198164 is an improvement. :)


The CEF table however might well be 192000 - I think I have seen it somewhere, but cannot find where. So I need someone to confirm that.


I think the BGP prefix is /24 because I have taken deny ge /21 filter off in the process of trying to find what is going on.


I have no idea how the /24 and /23 entries with next hop SW1 are generated... :( If I did I would have got rid of them by now.

VictorAKur Tue, 09/23/2008 - 14:48
User Badges:

No, sh mls cef maximum-routes isn't supported on these SUPs.


MSFC has enough memory for 256000 ip routes.


I can only assume that CEF table is coping, because there used to be an error in the syslog about "entries being software switched now", but it disappeared after the number of routes was reduced.

VictorAKur Thu, 09/25/2008 - 06:06
User Badges:

I had to shut BGP down between the SW1 and SW2 (on the diagram). Now SW1 cannot see the routes to the striken networks advertised from SW2 and sends traffic down its default route. I have broken the routing loop, but haven't found the cause of the problem.... :(

m-haddad Thu, 09/25/2008 - 08:10
User Badges:
  • Silver, 250 points or more

Hello Victor,


Can you send me please the output for the following please:


show ip route 66.185.180.0 (From SW1 and SW2)

Show ip route 166.49.129.0 (From SW1)


Thanks,


VictorAKur Fri, 09/26/2008 - 02:01
User Badges:

!!!!!!With BGP between the switches disabled:


sw1#sh ip route 66.185.180.0

Routing entry for 66.185.180.0/24

Known via "bgp xxxxx", distance 20, metric 4

Tag xxxxx, type external

Last update from 209.249.254.108 2d20h ago

Routing Descriptor Blocks:

* 209.249.254.108, from 209.249.254.108, 2d20h ago

Route metric is 4, traffic share count is 1

AS Hops 5


sw1#Show ip route 166.49.129.0

Routing entry for 166.49.128.0/17

Known via "bgp xxxxx", distance 20, metric 4

Tag xxxxx, type external

Last update from 209.249.254.108 05:44:33 ago

Routing Descriptor Blocks:

* 209.249.254.108, from 209.249.254.108, 05:44:33 ago

Route metric is 4, traffic share count is 1

AS Hops 3


sw2#sh ip route 66.185.180.0

Routing entry for 66.185.180.0/24

Known via "bgp xxxxx", distance 20, metric 0

Tag 5400, type external

Last update from 166.49.129.9 3d00h ago

Routing Descriptor Blocks:

* 166.49.129.9, from 166.49.129.9, 3d00h ago

Route metric is 0, traffic share count is 1

AS Hops 4


!!!!!!With BGP betweenb the switches enabled:



sw1#sh ip route 66.185.180.0

Routing entry for 66.185.180.0/24

Known via "bgp xxxxx", distance 200, metric 0

Tag 5400, type internal

Last update from 80.68.34.198 00:00:10 ago

Routing Descriptor Blocks:

* 80.68.34.198, from 80.68.34.198, 00:00:10 ago

Route metric is 0, traffic share count is 1

AS Hops 4


sw1#Show ip route 166.49.129.0

Routing entry for 166.49.128.0/17

Known via "bgp 20799", distance 20, metric 4

Tag xxxxx, type external

Last update from 209.249.254.108 05:46:54 ago

Routing Descriptor Blocks:

* 209.249.254.108, from 209.249.254.108, 05:46:54 ago

Route metric is 4, traffic share count is 1

AS Hops 3



sw2#sh ip route 66.185.180.0

Routing entry for 66.185.180.0/24

Known via "bgp xxxxx", distance 20, metric 0

Tag 5400, type external

Last update from 166.49.129.9 3d00h ago

Routing Descriptor Blocks:

* 166.49.129.9, from 166.49.129.9, 3d00h ago

Route metric is 0, traffic share count is 1

AS Hops 4



m-haddad Fri, 09/26/2008 - 08:23
User Badges:
  • Silver, 250 points or more

As you can see SW1 is preferring to get to 66.185.180.0 via SW2 because the AS Hops is 4 instead of 5 when the BGP is not enabled between the iBGP peers.


I don't see any issues with that, so where is your problem?


Thanks,

Regards,


m-haddad Fri, 09/26/2008 - 08:26
User Badges:
  • Silver, 250 points or more

Sorry I read the first post and as you described SW2 is switching it via SW1 instead of going directly the EBGP peer.


m-haddad Fri, 09/26/2008 - 08:30
User Badges:
  • Silver, 250 points or more

Hello Victor,


Is 166.49.129.9 a directly connected peer? If not can you issue the command show ip route 166.49.129.9 on SW2 while iBGP is running between the 6500s.


Thanks,


m-haddad Fri, 09/26/2008 - 09:12
User Badges:
  • Silver, 250 points or more

Victor,


Let me explain what I am trying to achieve here. Route recursion will happen for both the destination and the next hop. Therefore, the router will first try to find the longest match for the destination subnet. Once found it is going to do another route recursion for the next hop it got from the first step.


What I am afraid off is that SW2 is doing the first route recursion which is correct and then when it recure for the next hop it is finding that the next hop is via SW1. That's why the SW2 is switching this and sending it to SW1 where SW1 believe the route is still via SW2 and here we got the loop. This is incase the 166.49.129.9 is not a directly connected peer.


Hope this clarifies my line of thinking,


Regards,


VictorAKur Mon, 09/29/2008 - 02:46
User Badges:

Hi


unfortunatelly it is indeed a next hop address:


sw2#sh ip route 166.49.129.9

Routing entry for 166.49.129.8/30

Known via "connected", distance 0, metric 0 (connected, via interface)

Redistributing via ospf xxxxx

Advertised by ospf xxxxx subnets

Routing Descriptor Blocks:

* directly connected, via FastEthernet4/5

Route metric is 0, traffic share count is 1


m-haddad Mon, 09/29/2008 - 08:57
User Badges:
  • Silver, 250 points or more

Hello Victor,


Thanks for the feedback. If you remove the filtering you have done in the beginning would issue get resolved? I would also open a case with Cisco to troubleshoot more.


Regards,


VictorAKur Mon, 09/29/2008 - 14:09
User Badges:

I don't know if it would resolve the issue - if I remove the filtering the number of routes will go over the limit the SUP 2 card can support and the SUPs on both swithces will start alerting again. I did remove the filters for this particular network (66.x.x.x) for all prefixes and for about 12 hours it did work and then stoped working again. So I assume filtering may have something to do with it after all. Will it help if I post the actual filters I have installed? I have looked at them many times, but if someone else have a look he may spot the problem that I have missed.

m-haddad Mon, 09/29/2008 - 14:19
User Badges:
  • Silver, 250 points or more

Hello Victor,


As I understood that you removed the filters and things worked until you applied the filter again. Is that right?


It doesn't hurt if you can paste your filters to double check.


Thanks,


VictorAKur Tue, 09/30/2008 - 08:23
User Badges:

I removed only prefixes for 66.x.x.x network. It started working after that and worked for about 12 hours, after which it stopped working again with the filters still off.


Here is the prefix lists:


ip prefix-list ISP-Ingress-In-Strict seq 4004 deny 116.0.0.0/6 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 4008 deny 120.0.0.0/6 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 4011 deny 124.0.0.0/7 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 4013 deny 126.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 4014 deny 202.0.0.0/7 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 4016 deny 210.0.0.0/7 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 4018 permit 218.100.0.0/16 ge 17 le 24

ip prefix-list ISP-Ingress-In-Strict seq 4019 deny 218.0.0.0/7 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 4021 deny 220.0.0.0/7 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 4023 deny 222.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 5000 deny 24.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 5010 deny 72.0.0.0/6 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 5014 deny 76.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 5015 deny 96.0.0.0/6 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 5020 deny 198.0.0.0/7 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 5022 deny 204.0.0.0/7 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 5023 deny 206.0.0.0/7 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 5032 deny 208.0.0.0/8 ge 23

ip prefix-list ISP-Ingress-In-Strict seq 5033 deny 209.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 5034 deny 216.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 6001 deny 77.0.0.0/8 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6002 deny 78.0.0.0/7 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6004 deny 80.0.0.0/7 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 6006 deny 82.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 6007 deny 83.0.0.0/8 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6008 deny 84.0.0.0/6 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6012 deny 88.0.0.0/7 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6014 deny 90.0.0.0/8 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6015 deny 91.0.0.0/8 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 6016 deny 92.0.0.0/6 ge 22

ip prefix-list ISP-Ingress-In-Strict seq 6020 deny 193.0.0.0/8 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 6021 deny 194.0.0.0/7 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 6023 deny 212.0.0.0/7 ge 20

ip prefix-list ISP-Ingress-In-Strict seq 6025 deny 217.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 7000 deny 189.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 7001 deny 190.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 7002 deny 200.0.0.0/8 ge 25

ip prefix-list ISP-Ingress-In-Strict seq 7003 deny 201.0.0.0/8 ge 21

ip prefix-list ISP-Ingress-In-Strict seq 8001 deny 196.0.0.0/8 ge 23

ip prefix-list ISP-Ingress-In-Strict seq 10200 permit 0.0.0.0/0 le 24


I found it in a blog of a very nice gentleman.

m-haddad Fri, 10/03/2008 - 14:16
User Badges:
  • Silver, 250 points or more

Hello Victor,


Sorry for the delayed reply it was a very busy week. I suggest opening a case with Cisco especially that removing the filtering resolved the issue temporarily.


Regards,


Giuseppe Larosa Sat, 10/04/2008 - 10:34
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Viktor,


I see that you have added details to the case.

So removing the route filter you face the problem of maximum prefixes in CEF tables.

With the prefix-list applied you see the strange problem you have described in the first posts.

Then you have made a modified version of the prefix-list that doesn't filter the less specific 66/n and for a period of time of 12 hours everything was well but after again the problem.

Let me ask you if you apply the same route-filter on both switches SW1 and SW2 or you use different versions.

This is just a basic check, the prefix-list can be fine by itself but applying two slightly different prefix-lists to the two border routers could lead to unexpected results.


the number of IP prefixes was high even with the prefix-list applied.

Instead of permitting all prefixes with prefixlen le 24 you could get further ip prefix reduction by allowing only /21 or /22 or less specific to be accepted by your router so I would change the last line


ip prefix-list ISP-Ingress-In-Strict seq 10200 permit 0.0.0.0/0 le 22


I would try to have the two switches with far less prefixes : say 150000 prefixes and see what happens.

You could even think to use the maximum prefixes command to make a test with different received prefixes number


neighbor maximum-prefix


Hope to help

Giuseppe


VictorAKur Fri, 10/10/2008 - 00:36
User Badges:

Well...

The box finally collapsed. It crashed and came back in ROMMON. After reload it remained in ROMMON and complained that it lost boot sequence. I got it back to life and reset the bootvar to what it was before.

As a result of the crash the switch seems to have got rid of all the problems with CEF and now routes traffic as it should.

However we ended up taking the box off the front line as it has become a liability.


Thank you everyone for help.

Yet another problem with no real answer.

Actions

This Discussion