cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2200
Views
16
Helpful
17
Replies

Cisco 6509 Supervisor failover - How long?

jonathanaxford
Level 3
Level 3

Hi All,

Just a quick question...

How long would you expect a failover from one Supervisor card to the Slave supervisor card to take?

We have a Cisco 6509 with the following:

5 2 Supervisor Engine 720 (Active) WS-SUP720-3B

6 2 Supervisor Engine 720 (Hot) WS-SUP720-3B

When the Active card was 'pulled' by an engineer, the switch rebooted and took a good 5 minutes or so to come back online.

I would have expected a much quicker failover myself...

Anything we need to check or are missing?

Many thanks

Jonathan

17 Replies 17

paul.matthews
Level 5
Level 5

What redundancy mode have you set? five mins sounds like the old failover, where all the line cards get rebooted. SSO should be in the region of a second.

ankbhasi
Cisco Employee
Cisco Employee

Hi Jonathan,

I believe you are already running SSO mode because your standby sup says it is HOT which is only possible when you run SSO.

Now when you say you get 5 minutes of downtime looks to be something really going wrong because with SS) it must be in seconds.

Can you paste the output of "sh redundancy" and "sh module" and "sh version" from your box with both sups into the chassis?

Regards,

Ankur

Good points - it may also be worth describing how you measure when the failover is complete, as there are a number of points at which that may be deemed to have happened.

Are you saying complete is when routing protocols have fully converged?

Hi Paul,

I suppose complete would be when everything has converged. For us here, that shouldn'ttake too long as its a single OSPF area with 4 switches in, all directly connected to the 6509 in question.

What i expected to see was a slight 'Blip' in the network, where the standby SUP took over, but what we actually had was 5 or so minutes where everything lost connection...

Our Cisco re-seller/support company are looking into this for us too, but i always like to get a heads up if i can!

Cheers

Hi Again,

the Show Redundancy is:

Redundant System Information :

------------------------------

Available system uptime = 26 weeks, 2 days, 18 hours, 23 minutes

Switchovers system experienced = 2

Standby failures = 0

Last switchover reason = active unit removed

Hardware Mode = Duplex

Configured Redundancy Mode = sso

Operating Redundancy Mode = sso

Maintenance Mode = Disabled

Communications = Up

Current Processor Information :

-------------------------------

Active Location = slot 5

Current Software state = ACTIVE

Uptime in current state = 4 days, 3 hours, 46 minutes

Image Version = Cisco Internetwork Operating System Software

IOS (tm) s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(18)S

XF8, RELEASE SOFTWARE (fc2)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2007 by cisco Systems, Inc.

Compiled Sat 03-Mar-07 00:07 by tinhuang

BOOT = sup-bootdisk:s72033-advipservicesk9_wan-mz.122-

18.SXF8.bin,1;sup-bootdisk:s72033-entservicesk9_wan-mz.122-18.SXF7.bin,1;

CONFIG_FILE =

BOOTLDR =

Configuration register = 0x2102

Peer Processor Information :

----------------------------

Standby Location = slot 6

Current Software state = STANDBY HOT

Uptime in current state = 4 days, 3 hours, 40 minutes

Image Version = Cisco Internetwork Operating System Software

IOS (tm) s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(18)S

XF8, RELEASE SOFTWARE (fc2)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2007 by cisco Systems, Inc.

Compiled Sat 03-Mar-07 00:07 by tinhuang

BOOT = sup-bootdisk:s72033-advipservicesk9_wan-mz.122-

18.SXF8.bin,1;sup-bootdisk:s72033-entservicesk9_wan-mz.122-18.SXF7.bin,1;

CONFIG_FILE =

BOOTLDR =

Configuration register = 0x2102

Will post the Show mod and Show ver in seperate messages (Ran out of space...)

Cheers!

Show Module:

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ -----------

1 48 CEF720 48 port 1000mb SFP WS-X6748-SFP SAL1105FZ3J

2 48 CEF720 48 port 1000mb SFP WS-X6748-SFP SAL1105FZ59

3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1109J81F

4 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1104F66S

5 2 Supervisor Engine 720 (Active) WS-SUP720-3B SAL1105FDV3

6 2 Supervisor Engine 720 (Hot) WS-SUP720-3B SAL1104F35D

Mod MAC addresses Hw Fw Sw Status

--- ---------------------------------- ------ ------------ ------------ -------

1 001a.6dbb.9710 to 001a.6dbb.973f 1.8 12.2(14r)S5 12.2(18)SXF8 Ok

2 001a.6dbb.8d38 to 001a.6dbb.8d67 1.8 12.2(14r)S5 12.2(18)SXF8 Ok

3 001b.2ab4.7180 to 001b.2ab4.71af 2.5 12.2(14r)S5 12.2(18)SXF8 Ok

4 001a.e2d4.8e34 to 001a.e2d4.8e63 2.5 12.2(14r)S5 12.2(18)SXF8 Ok

5 0016.9df6.d630 to 0016.9df6.d633 5.3 8.4(2) 12.2(18)SXF8 Ok

6 0016.4708.1100 to 0016.4708.1103 5.3 8.4(2) 12.2(18)SXF8 Ok

Mod Sub-Module Model Serial Hw Status

---- --------------------------- ------------------ ----------- ------- -------

1 Centralized Forwarding Card WS-F6700-CFC SAL1052CFAP 2.1 Ok

2 Centralized Forwarding Card WS-F6700-CFC SAL1052CHFD 2.1 Ok

3 Centralized Forwarding Card WS-F6700-CFC SAL1106G6J6 2.1 Ok

4 Centralized Forwarding Card WS-F6700-CFC SAL1107H3LA 2.1 Ok

5 Policy Feature Card 3 WS-F6K-PFC3B SAL1104FBAS 2.3 Ok

5 MSFC3 Daughterboard WS-SUP720 SAL1104FB1E 2.6 Ok

6 Policy Feature Card 3 WS-F6K-PFC3B SAL1104FBBQ 2.3 Ok

6 MSFC3 Daughterboard WS-SUP720 SAL1105FDN7 2.6 Ok

Mod Online Diag Status

---- -------------------

1 Pass

2 Pass

3 Pass

4 Pass

5 Pass

6 Pass

Show version:

Cisco Internetwork Operating System Software

IOS (tm) s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(18)S

XF8, RELEASE SOFTWARE (fc2)

Technical Support: http://www.cisco.com/techsupport

Copyright (c) 1986-2007 by cisco Systems, Inc.

Compiled Sat 03-Mar-07 00:07 by tinhuang

Image text-base: 0x40101040, data-base: 0x42D98000

ROM: System Bootstrap, Version 12.2(17r)S4, RELEASE SOFTWARE (fc1)

BOOTLDR: s72033_rp Software (s72033_rp-ADVIPSERVICESK9_WAN-M), Version 12.2(18)S

XF8, RELEASE SOFTWARE (fc2)

NPLYG28-C-1 uptime is 4 days, 4 hours, 17 minutes

Time since NPLYG28-C-1 switched to active is 4 days, 3 hours, 55 minutes

System returned to ROM by unknown reload cause - suspect boot_data[BOOT_COUNT] 0

x0, BOOT_COUNT 0, BOOTDATA 19 (SP by power on)

System restarted at 06:25:43 GMT Thu Oct 11 2007

System image file is "sup-bootdisk:s72033-advipservicesk9_wan-mz.122-18.SXF8.bin

"

This product contains cryptographic features and is subject to United

States and local country laws governing import, export, transfer and

use. Delivery of Cisco cryptographic products does not imply

third-party authority to import, export, distribute or use encryption.

Importers, exporters, distributors and users are responsible for

compliance with U.S. and local country laws. By using this product you

agree to comply with applicable laws and regulations. If you are unable

to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:

http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to

export@cisco.com.

cisco WS-C6509-E (R7000) processor (revision 1.3) with 458720K/65536K bytes of m

emory.

Processor board ID SMC11030009

SR71000 CPU at 600Mhz, Implementation 0x504, Rev 1.2, 512KB L2 Cache

Last reset from s/w reset

SuperLAT software (copyright 1990 by Meridian Technology Corp).

X.25 software, Version 3.0.0.

Bridging software.

TN3270 Emulation software.

17 Virtual Ethernet/IEEE 802.3 interfaces

196 Gigabit Ethernet/IEEE 802.3 interfaces

1917K bytes of non-volatile configuration memory.

8192K bytes of packet buffer memory.

65536K bytes of Flash internal SIMM (Sector size 512K).

Configuration register is 0x2102

Many thanks for looking into this for me...

You are running SSO - that should be nice and quick. There are two main areas where there could be an issue. The is the failover itself, and then there is the routing convergence afterwards.

So we need to figure out what your uss is (and how many issues!) Is this in a lab environment where you can test it or is it live?

The first test I would try is to get three PCs attached to the switch. PC A & B in the same VLAN, PC C in a third. No HSRP or anything and this switch being the default gateway for them all. Set a ping going indefinitely to PC a from both B&C and trigger a failover. See how many pings get dropped. That tells us if the basic SSO is working properly. I would expect no more than two responses to be lost on either PC.

If that is OK we know the SSO is working OK and we need to look at routing convergence - have you configured for either Cisco NSF or for Graceful restart at all?

http://www.cisco.com/en/US/products/sw/iosswrel/ps1829/products_feature_guide09186a00805e8fbd.html

Hi Paul,

Thanks for the response. Unfortunately, this is in the production network. I am going to schedule in another failover test as it will be the only way to get to the bottom of it.

I want to actually be there myself so i can see what happens. I like your suggestion about the PC's, i will easily be able to set that up for the test.

I have not set up NSF or Graceful restart, i have begun reading about it, thanks for the link.

One thing i thought though, is that if the majority of the VLANs are directly connected to the 6509, would there still be a big delay in the Layer3 convergence?

thanks for all your help...

Jonathan:

have you considered STP convergence? Perhaps STP is adding a considerable amount of convergence time.

Are you running rpvst+?

If not, do you have uplinkfast and backbonefast enabled?

Check some of that stuff out, too.

HTH

Victor

Hi Victor, Thanks for the response.

I was unaware that a stateful failover of the SUP cards would trigger STP reconvergence. We are running rapid-PVST. It is a good point though as we have had issues with STP convergence in the past....

Layer two will carry on, layer thre will effectively be a complete reinitialisation. Locat routes in the cache will continue to be forwarded, but bear in mind the OSPF neighbours will have lost the adjacency with the 6500 and thus can all the routes via it. 5 mins does seem an awful long time for OSPF to converge though.

Paul:

If his sup supports NSF and SSO then he shouldnt lose the neighbor relationships, right? isn't that the main selling point for NSF, that it keeps the interfaces up and the neighbor relationships established during the stateful switchover to the redundant sup module?

Jonathan:

if your sup isnt supporting NSF and SSO then your STP probably did reconverge. Although NSF is a high availability technology for L3 forwarding, it does depend on the links and interfaces remaining in the "up,up" state, so L2 connectivity does play a role in NSF and SSO.

NSF is a bit of a funny - to even have a chance, it has to be ecplicitly configured. The FIB is shared betweeen the active and standy, but neighbour relationships are not. My understanding of the NSF aware features of routing protocols is that it is not only the router itself that needs to be configured, but the neigbours, and the main (over simplified) effect is that they become more tolerant of a nion-responsive neighbour to keep routes in the table rather than immediately drop them.

Cached routes are in the FIB so forwarding will continue.

P.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: