cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2398
Views
15
Helpful
1
Replies

Nexus 7000 - unexpected shutdown of vPC-Ports during reload of the primary vPC Switch

Dear Community,

 

We experienced an unusual behavior of two Nexus 7000 switches within a vPC domain.

According to the attached sketch, we have four N7Ks in two data centers - two Nexus 7Ks are in a vPC domain for each data center.

Both data centers are connected via a Multilayer-vPC.

We had to reload one of these switches and I expected the other N7K in this vPC domain to continue forwarding over its vPC-Member-ports.

Actually, all vPC ports have been disabled on the secondary switch until the reload of the first N7K (vPC-Role: primary) finished.

 

 

Logging on Switch B:

20:11:51 <Switch B> %VPC-2-VPC_SUSP_ALL_VPC: Peer-link going down, suspending all vPCs on secondary

20:12:01 <Switch B> %VPC-2-PEER_KEEP_ALIVE_RECV_FAIL: In domain 1, VPC peer keep-alive receive has failed

 

In case of a Peer-link failure, I would expect this behavior if the other switch is still reachable via the Peer-Keepalive-Link (via the Mgmt-Port), but since we reloaded the whole switch, the vPCs should continue forwarding. 

Could this be a bug or are there any timers to be tuned?

 

All N7K switches are running on NX-OS 6.2(8)

 

Switch A:

vpc domain 1
  peer-switch
  role priority 2048
  system-priority 1024
  peer-keepalive destination <Mgmt-IP-Switch-B>
  delay restore 360
  peer-gateway
  auto-recovery reload-delay 360
  ip arp synchronize

interface port-channel1
  switchport mode trunk
  switchport trunk allowed vlan <x-y>
  spanning-tree port type network
  vpc peer-link

Switch B:

vpc domain 1
  peer-switch
  role priority 1024
  system-priority 1024
  peer-keepalive destination <Mgmt-IP-Switch-A>
  delay restore 360
  peer-gateway
  auto-recovery reload-delay 360
  ip arp synchronize

interface port-channel1
  switchport mode trunk
  switchport trunk allowed vlan <x-y>
  spanning-tree port type network
  vpc peer-link

 

Best regards

1 Reply 1

Problem solved:

During the reload of the Nexus 7K, the linecards were powerd off a short time earlier than the Mgmt-Interface. As a result of this behavior, the secondary Nexus 7K received at least one vPC-Peer-Keepalive Message while its peer-link was already powerd off. To avoid a split brain scenario, the VPC-member-ports have been shut down.

Now we are using dedicated interfaces on the linecards for the VPC-Peer-Keepalive-Link and a reload of one N7K won't result in a total network outage any more.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: