loop

Unanswered Question
Apr 13th, 2010
User Badges:

Hi,


in attachment is topology and depicted situation (with output of show commands).


Accidentely, static route that was supposed to be on PE1 was configured on PE2 and thus LOOP was created.

(next-hop of network 11.11.11.0/24 is 10.10.10.10 which is directly connected to PE1),


My question why only PE2's CPU is 100% utilised.? It is Cisco 7600 and it should work in hardware.

PE1, P1, P2 have 1% CPU utilisation.

Maybe it started to forward packets for this network 11.11.11.0/24 in proccess switching mode instead in CEF, but again why isn't it switched in CEF?


Does anyone have idea why is this happening?


(There are a lot of VPNs, static routes and traffic (in this and other VPNs) in my netwok and everything works fine, so MPLS network is properly configured.)


Thanks in advance,

A.

Attachment: 
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Tue, 04/13/2010 - 14:45
User Badges:
  • Super Silver, 17500 points or more
  • Hall of Fame,

    Founding Member

Hello Antonio,


from the picture you have attached we can see that PE1 is sending traffic for 11.11.11.0 to PE2 and we can understand this from the label stack with 585 as internal label that is the same as local label on PE2.


PE2 uses recursion to resolve the next-hop and finds out that it has to send traffic to PE1 that advertises net 10.10.10.0/24 on MP BGP


so PE2 is the only one to advertise net 11.11.11.0/24 in MP BGP (advertised by BGP 65001 in sh ip route vrf VPN_1 11.11.11.0) but with a next-hop known in MP BGP actually iBGP from PE1 (from sh ip route vrf VPN_1 10.10.10.10 we see AD 200 with global IP next-hop 192.168.100.1)


Probably CEF detects the inconsistency and sends all packets to RP causing high cpu because stats are at 0.

The CEF entry exists.


I agree that I would expect a symmetric behaviour but PE2 is the one pointing to a far next-hop.


Have you had a chance to get a sh proc cpu sorted on PE2 before fixing the configuration error?


What IOS image is running on PE2 and PE1? is it the same or different ?


Are both devices configured the same way regarding MPLS TTL propagation?


traffic should be sent from PE2 to PE1 to PE2 until TTL expires and someone has to send an ICMP unreachable to source of original packet.


We can guess that the behaviuor is deterministic and given an initial TTL all packets will expire after N loops on same node.


May be PE2 is the node that is charged to send icmp with TTL expired for each packet.


or You may have hit a SW bug


Hope to help

Giuseppe

Antonio_1_2 Thu, 04/15/2010 - 05:38
User Badges:

Hello Giuseppe,


"Probably CEF detects the inconsistency and sends all packets to RP causing high cpu because stats are at 0.The CEF entry exists".

I issued on PE2#sh ip cef inconsistency records detail
Consistency checker master control: enabled
Table consistency checker state:
lc-detect: enabled
  0/0/0/0 queries sent/ignored/checked/iterated
scan-lc: enabled [83 prefixes checked every 60s]
  0/0/0/0 queries sent/ignored/checked/iterated
scan-rp: enabled [83 prefixes checked every 60s]
  583808/0/0/0 queries sent/ignored/checked/iterated
scan-rib: enabled [1000 prefixes checked every 60s]
  1652900/0/1652900/0 queries sent/ignored/checked/iterated
scan-hw-sw: disabled
  0/0/0/0 queries sent/ignored/checked/iterated
scan-sw-hw: disabled
  0/0/0/0 queries sent/ignored/checked/iterated
full-scan-rib: enabled
  0/0/0/0 queries sent/ignored/checked/iterated
full-scan-rp: enabled
  0/0/0/0 queries sent/ignored/checked/iterated
full-scan-lc: enabled
  0/0/0/0 queries sent/ignored/checked/iterated
full-scan-hw-sw: disabled
  0/0/0/0 queries sent/ignored/checked/iterated
full-scan-sw-hw: disabled
  0/0/0/0 queries sent/ignored/checked/iterated
Inconsistency error messages are disabled
Inconsistency auto-repair is disabled
Inconsistency auto-repair runs: 0
Inconsistency statistics: 0 confirmed, 0/16 recorded
Table test modes:
Insert mode: normal


shouldn't it be shown here if there were any inconistances? If there is inconsistency with cef are then packets forwarded via proccess switching?


"Have you had a chance to get a sh proc cpu sorted on PE2 before fixing the configuration error"

yes I Have, before fixing configuration error:


sh proc cpu sorted                            
CPU utilization for five seconds: 48%/48%; one minute: 46%; five minutes: 40%
PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
  26        3920     96958         40  0.07%  0.02%  0.00%   0 HC Counter Timer
   1           0        32          0  0.00%  0.00%  0.00%   0 Chunk Manager   
   2         112     78569          1  0.00%  0.00%  0.00%   0 Load Meter      
   3        4336    812792          5  0.00%  0.00%  0.00%   0 OSPF Router 1   
   4           4       615          6  0.00%  0.00%  0.00%   0 TACACS+         
   5     1089080     72851      14949  0.00%  0.33%  0.28%   0 Check heaps     
   6           0         1          0  0.00%  0.00%  0.00%   0 Pool Manager    
   7           0         2          0  0.00%  0.00%  0.00%   0 Timers          
   8        9896     30906        320  0.00%  0.00%  0.00%   0 ARP Input       
   9           0         1          0  0.00%  0.00%  0.00%   0 AAA_SERVER_DEADT
  10           0         2          0  0.00%  0.00%  0.00%   0 AAA high-capacit
  11          36        68        529  0.00%  0.00%  0.00%   0 Entity MIB API  
  12           0         1          0  0.00%  0.00%  0.00%   0 IFS Agent Manage
  13           8      6549          1  0.00%  0.00%  0.00%   0 IPC Dynamic Cach
  14           4        49         81  0.00%  0.00%  0.00%   0 PF_Split Sync Pr
  15          76    392585          0  0.00%  0.00%  0.00%   0 IPC Periodic Tim
  16         104    392583          0  0.00%  0.00%  0.00%   0 IPC Deferred Por
  17      140212     13383      10476  0.00%  0.01%  0.00%   0 IPC Seat Manager
  18           0         1          0  0.00%  0.00%  0.00%   0 IPC Stdby Update
  19           0         2          0  0.00%  0.00%  0.00%   0 DDR Timers      
  20           0         2          0  0.00%  0.00%  0.00%   0 Dialer event


after fixing error it goes down to 1%/0%.


What IOS image is running on PE2 and PE1? is it the same or different ?

It is different PE1 is 12.2(18)SXF7 and on PE2 is 12.2(18)SXF.


"Are both devices configured the same way regarding MPLS TTL propagation?" yes on both routers is configred "no mpls ip propagate-ttl"


But I made some further testing. I made inverse configuration. This time I configured static rute on PE1 with nexthop that is connected to PE2, and now PE1 CPU goes to 100%.

I think I reproduced this also in lab, and if there is "mpls ip propagate-ttl" in configuration then router's CPU works fine, but if there is "no mpls ip propagate-ttl" then CPU utilisation goes up to 100%.


thank you Guiseppe,

A.

Actions

This Discussion