03-27-2015 02:12 AM - edited 03-07-2019 11:16 PM
Hi All,
We are using two 6500 User distribution connected with port channel also WLC is connected on Dist-2 now, these two 6500 are connected with backend core 7609 running ospf in between. Spanning-tree root and Standby Active on Dist-1
Suddenly we are facing High CPU utilization on switches, couldn't able to ping nor login. However I could observe to the following through console.
Switch1 is stable, all the vlans are active, but on standby all the vlans are getting state changes, suspecting there are not getting hello's on time due to which it is going to active and standby.
%HSRP-5-STATECHANGE: Vlan208 Grp 208 state Standby -> Active
%HSRP-5-STATECHANGE: Vlan160 Grp 160 state Speak -> Standby
%HSRP-5-STATECHANGE: Vlan115 Grp 115 state Standby -> Active
%HSRP-5-STATECHANGE: Vlan156 Grp 156 state Speak -> Standby
%HSRP-5-STATECHANGE: Vlan208 Grp 208 state Speak -> Standby
%HSRP-5-STATECHANGE: Vlan209 Grp 209 state Speak -> Standby
%HSRP-5-STATECHANGE: Vlan196 Grp 196 state Speak -> Standby
%HSRP-5-STATECHANGE: Vlan112 Grp 112 state Speak -> Standby
%HSRP-5-STATECHANGE: Vlan196 Grp 196 state Standby -> Active
Configuration of Wireless VLAN
on both switch
interface Vlan198
ip address 10.X.X.X
ip helper-address
standby 198 ip 10.X.X.X
standby 198 timers 250 msec 750
standby 198 priority 90
standby 198 preempt
end
All the vlans having same timers and preempt on both switches,
we are not tracking any interface, can I remove preempt on secondary switch?
I have captured packets through netdr
Dist-2
interface Vl198, routine mistral_process_rx_packet_inlin, timestamp 23:53:02.823
dbus info: src_vlan 0xC6(198), src_indx 0x2(2), len 0x40(64)
bpdu 0, index_dir 0, flood 1, dont_lrn 0, dest_indx 0x40C6(16582)
F8020400 00C60000 00020000 40080168
S000 E0000560 8E0FFFF8 00000008 40C60000
mistral hdr: req_token 0x0(0), src_index 0x2(2), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0xC6(198)
destmac FF.FF.FF.FF.FF.FF, srcmac 00.00.0C.07.AC.C6, protocol 0806
layer 3 data: 00010800 06040002 00000C07 ACC60A19 C601FFFF FFFFFFFF
0A19C601 00000000 00000000 00000000 00000000 0000C601
8300A369 00000000 0000FFFF
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 48, identifier 0
df 0, mf 0, fo 0, ttl 1, src 10.X.X.X, dst 224.0.0.2
udp src 1985, dst 1985 len 28 checksum 0x53BE
Dist -1
interface Vl198, routine mistral_process_rx_packet_inlin, timestamp 23:48:26.830
dbus info: src_vlan 0xC6(198), src_indx 0x341(833), len 0x42(66)
bpdu 0, index_dir 0, flood 1, dont_lrn 0, dest_indx 0x40C6(16582)
60020401 00C60400 03410400 42080000 00110448 0E087C7C 00000008 40C60000
mistral hdr: req_token 0x0(0), src_index 0x341(833), rx_offset 0x76(118)
requeue 0, obl_pkt 0, vlan 0xC6(198)
destmac 01.00.5E.00.00.02, srcmac 00.00.0C.07.AC.C6, protocol 0800
protocol ip: version 0x04, hlen 0x05, tos 0x00, totlen 48, identifier 0
df 0, mf 0, fo 0, ttl 1, src 10.X.X.X, dst 224.0.0.2
udp src 1985, dst 1985 len 28 checksum 0x53BF
This issue is not continuous, its triggering intermittently sometime with in couple of hours or days
03-27-2015 06:20 AM
Hi Anup,
The HSRP flaps could only be the victim of the high cpu.
The nedr capture above seems to be a normal HSRP packet (destination 224.0.0.2) src mac 00.00.0C.07.AC.C6 and it has flood bit 1.
I think we cannot conclude based on that what is the reason for high cpu. Do you have complete output for netdr during the issue. Also you can check basic outputs such as
sh proces cpu sorted | ex 0.00%
Also to ensure we have l2 stability, you could turn on mac move notification to see if any loop.
(config)# #mac address-table notification mac-move
Kindly share the outputs.
Hope this helps.
Thanks,
Madhu.
03-27-2015 08:28 AM
Attached netdr output from both the distribution switches
srcmac 00.23.EA.7A.FC.00 - Mac address of vlan 198 interferface on Dist-2
This 2 switches connected to 20 floor switches, where each floors are in stack. WLC connected on Distribution 2 and multiple AP's are connected on floor switches.
03-27-2015 09:53 AM
Hi,
The netdr capture does not provide much info. Most of them are hsrp packets. The timers are also quite aggressive so i think it is expected to have lot of packets.
Do you see input queue drops on interfaces between these 2 or any other interfaces?
Also have you enable mac move notification? It is not service impacting command.
Thanks,
Madhu.
03-27-2015 09:57 AM
Also share
sh proc cpu sorted | ex 0.00%
03-27-2015 10:27 AM
Mac move notification has been enabled, I can see drops on WLC connected interface.
As said issue is not continuous, again if it occurs I will share cpu sorted.
Do I need to enable mac move notification on floor switches, also should I need to harden the trunk interfaces of floor switches.
03-27-2015 02:01 PM
After digging I could able to find out
VLAN0195 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 165 last change occurred 00:31:14 ago
from GigabitEthernet6/14
VLAN0196 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 122 last change occurred 00:31:14 ago
from GigabitEthernet6/14
VLAN0197 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 291 last change occurred 00:31:14 ago
from GigabitEthernet6/14
VLAN0198 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 961 last change occurred 00:31:14 ago
from GigabitEthernet6/14
VLAN0199 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 1165 last change occurred 00:31:25 ago
from GigabitEthernet6/14
VLAN0200 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 888 last change occurred 00:31:25 ago
from GigabitEthernet6/14
Port g 6/14 config
interface GigabitEthernet6/14
switchport
switchport trunk encapsulation dot1q
switchport trunk native vlan 200
switchport trunk allowed vlan 195-200
switchport mode trunk
connected to Dist2- WLC
195 Wireless-Voice active
196 GTC-VIDEO active
197 IP_Camera active
198 Wireless-client active
199 Cisco-AP-MGMT active
200 NW-MGMT active
Rest of all vlans
VLAN0110 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 824 last change occurred 00:51:49 ago
from TenGigabitEthernet1/3
VLAN0111 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 680 last change occurred 00:51:49 ago
from TenGigabitEthernet1/3
Also some ports I can root inconsistent on secondary switch configured as root secondary
Ten gig1/3 is connected to floor switch
VLAN0110 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 297 last change occurred 00:37:47 ago
from StackPort1
VLAN0111 is executing the rstp compatible Spanning Tree protocol
Number of topology changes 164 last change occurred 00:37:47 ago
from StackPort1
03-30-2015 09:27 AM
So you mean to say you saw high cpu during these spanning-tree changes?
Keep the mac-flap notification turned on. That will help to check in case of any flapping ports.
Also you can try to set up an EEM script to capture show process cpu sorted
| ex 0.00%, sh span tree, netdr etc.. Please let me know if you require help with EEM.
Thanks,
Madhu.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: