Cisco Support Community
cancel
Showing results for 
Search instead for 
Did you mean: 

How to configure and resolve high availability, redundant failover issues on the Catalyst 6500/6000 Series Switch

Core issue

The symptoms of an unsuccessful failover can include an unintended reload, the disappearance of software features if the primary software configuration is not supported on the redundant Supervisor Engine, and the modules that do not come online if the software on the redundant Supervisor Engine does not support them.

Resolution

With the Catalyst 6500 and 6000 Series Switches, there are several forms of redundancy available, depending on your hardware type and software operating system and version. This is a list of available redundancy forms:

  • Internal MultiLayer Switch Feature Card (MSFC)—Each MSFC functions independently, with one designated as the primary (DR). Hot Standby Router Protocol (HSRP) is used. Both MSFCs run the same routing protocols and have the same routing table. Therefore, when a failure in one MSFC occurs, the second MSFC does not need to wait for the routing protocols to converge before forwarding packets. Combined with high availability for Layer 2 (L2) failover, it recovers in a few seconds in case of failure of one Supervisor Engine or MSFC. 

  • Single Router Mode (SRM)—When SRM is enabled, the non-DR is online, but it has all of its interfaces down. As a result, it does not hold any routing table information. If the DR fails, there is some delay before the non-DR that comes online has a complete route table. In order to help account for this, the information used prior to the failure by the Supervisor Engine for Layer 3 (L3) forwarding is maintained and updated with any new information from the new DR. 

  • Enhanced-High-System Availability (EHSA)—The Supervisor Engine that boots first, either in slot 1 or 2, becomes the EHSA active Supervisor Engine. The MSFC and Policy Feature Card (PFC) become fully operational. The MSFC and PFC on the redundant Supervisor Engine come out of reset but are not operational. The EHSA feature is not Supervisor Engine mirroring or load balancing. Network services are disrupted until the redundant Supervisor Engine takes over and the switch recovers. 

  • Route Processor Redundancy (RPR, RPR+)—RPR supports a switchover time of two to four minutes and RPR+ supports a switchover time of 30 to 60 seconds.

  • Non Stop Forwarding (NSF) with Stateful Switch Over (SSO)—Catalyst 6500 Series Switches support fault resistance by allowing a redundant Supervisor Engine to take over if the primary Supervisor Engine fails. Cisco NSF works with SSO in ordeto minimize the amount of time a network is unavailable to its users after a switchover while it continues to forward IP packets.

When the switch is powered on, RPR runs between the two Supervisor Engines. The Supervisor Engine that boots first, either in slot 1 or 2, becomes the RPR active Supervisor Engine. The MSFC or the MultiLayer Switch Feature Card II (MSFC2) and PFC or Policy Feature Card 2 (PFC2) become fully operational. The MSFC and PFC on the redundant Supervisor Engine come out of reset but are not operational.

When RPR+ mode is used, the redundant Supervisor Engine is fully initialized and configured, which shortens the switchover time. The active Supervisor Engine checks the image version of the redundant Supervisor Engine when the redundant Supervisor Engine comes online. If the image on the redundant Supervisor Engine does not match the image on the active Supervisor Engine, RPR redundancy mode is used.

RPR+ includes reduced switchover time, installed modules are not reloaded, and Online Insertion and Removal (OIR) of the redundant Supervisor Engine.

Note: The initial reload after the configuration of the High Availability (HA) is longer and the reload after is normal because initially, both Supervisors need to synchronize in order for HA to work properly.

In some scenarios, it is noticed that the switchover of the Sup1a module in slot-1 from Active to Standby and Sup1a module in slot-2 standby to active takes about 10 seconds. But while it switches back, for example, Sup1a module in slot1, it takes about 40 seconds for it to become active.

The reason for this behavior can be because the switchover time from the active supervisor engine to the standby supervisor engine does not include the spanning-tree convergence time.

Further enhancements to HA continue. Refer to these documents for more information:

Error message

%STANDBY-3-DUPADDR: Duplicate address

%STANDBY-6-STATECHANGE

1997
Views
0
Helpful
0
Comments