ACE module failed secondary ACE did not take over.

krunal_shah · ‎01-05-2009

Show Tech from the 6509 switch the ACE module lives in

==========================================================================

ACE#show module

Mod Ports Card Type Model Serial No.

--- ----- -------------------------------------- ------------------ -----------

1 3 Network Analysis Module WS-SVC-NAM-1 SAD115204BS

2 1 Application Control Engine Module ACE20-MOD-K9 SAD1204075T

3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1204E77M

4 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1205EJSV

5 5 Supervisor Engine 720 10GE (Active) VS-S720-10G SAL1205EKKW

Mod MAC addresses Hw Fw Sw Status

--- ---------------------------------- ------ ------------ ------------ -------

1 001d.a223.0886 to 001d.a223.088d 4.1 7.2(1) 3.6(1a) Ok

2 001e.f7a1.4b48 to 001e.f7a1.4b4f 2.3 8.7(0.22)ACE A2(1.2) MajFail

3 001e.f7c5.92f0 to 001e.f7c5.931f 2.7 12.2(14r)S5 12.2(33)SXH1 Ok

4 001e.f760.ea10 to 001e.f760.ea3f 2.7 12.2(14r)S5 12.2(33)SXH1 Ok

5 0019.e8bb.0314 to 0019.e8bb.031b 2.0 8.5(2) 12.2(33)SXH1 Ok

Mod Sub-Module Model Serial Hw Status

---- --------------------------- ------------------ ----------- ------- -------

3 Centralized Forwarding Card WS-F6700-CFC SAL1204DVCR 4.0 Ok

4 Centralized Forwarding Card WS-F6700-CFC SAL1204E4US 4.0 Ok

5 Policy Feature Card 3 VS-F6K-PFC3C SAD120404S9 1.0 Ok

5 MSFC3 Daughterboard VS-F6K-MSFC3 SAD120407J6 1.0 Ok

Mod Online Diag Status

---- -------------------

1 Pass

2 Major Error

3 Pass

4 Pass

5 Pass

The standby ACE can not detect the Active unit, but has failed to go HOT and carry traffic.

ACE2/Admin# show ft group sta

FT Group : 1

Configured Status : in-service

Maintenance mode : MAINT_MODE_OFF

My State : FSM_FT_STATE_STANDBY_COLD

Peer State : FSM_FT_STATE_UNKNOWN

Peer Id : 1

No. of Contexts : 1

FT Group : 116

Configured Status : in-service

Maintenance mode : MAINT_MODE_OFF

My State : FSM_FT_STATE_STANDBY_COLD

Peer State : FSM_FT_STATE_UNKNOWN

Peer Id : 1

No. of Contexts : 1

FT Group : 180

Configured Status : in-service

Maintenance mode : MAINT_MODE_OFF

My State : FSM_FT_STATE_STANDBY_COLD

Peer State : FSM_FT_STATE_UNKNOWN

Peer Id : 1

No. of Contexts : 1

FT Group : 181

Configured Status : in-service

Maintenance mode : MAINT_MODE_OFF

My State : FSM_FT_STATE_STANDBY_COLD

Peer State : FSM_FT_STATE_UNKNOWN

Peer Id : 1

No. of Contexts : 1

FT Group : 182

Configured Status : in-service

Maintenance mode : MAINT_MODE_OFF

My State : FSM_FT_STATE_STANDBY_COLD

Peer State : FSM_FT_STATE_UNKNOWN

Peer Id : 1

No. of Contexts : 1

ciscocsoc · ‎01-05-2009

Hi,

In a normal FT situation I'd expect to see an active and a hot standby. If one of the contexts is in state FSM_FT_STATE_STANDBY_COLD then this implies that synchronisation between the ACE blades has been broken.

Synchronisation can be broken by differences in the files in a context. Files include scripted probes, SSL certificates and SSL certificate keys. Until both contexts have identical files then synchronisation will not happen.

To resolve this problem, ensure that the files are the same. On both contexts check the configs line-by-line for differences and check the directories for files.

If replication does not restart automatically then you will need to restart it from each of the ACTIVE contexts.

conf t

no ft auto-sync running-config

no ft auto-sync startup-config

ft auto-sync running-config

ft auto-sync startup-config

HTH

Cathy

krunal_shah · ‎01-05-2009

Thank you Cathy for your reply,

FYI.. Cu see the peer priority on the standby unit to a lower value, however it had no affect on the HA state. See below for more details.

ft group 116

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 180

peer 1

priority 95

peer priority 85

associate-context SHARED

inservice

ft group 181

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 182

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 183

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 184

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 185

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 186

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 187

peer 1

priority 95

peer priority 85

associate-context <>

inservice

ft group 188

peer 1

priority 95

peer priority 85

associate-context <>

inservice

We are now taking it out of the FT group to make the secondary unit a stand alone. This seems to have allowed traffic to resume normal operation.

ft group 116

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 180

peer 1

priority 95

peer priority 85

associate-context SHARED

no inservice

ft group 181

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 182

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 183

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 184

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 185

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 186

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 187

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

ft group 188

peer 1

priority 95

peer priority 85

associate-context <>

no inservice

All virtual contexts are now at INIT

see the attachment part2.txt

ciscocsoc · ‎01-06-2009

Do you have an ft interface defined in your admin contexts and is it active and linked to peer group 1? If you have an active ft interface is the vlan plumbed through between the ACEs?

E.g. in my admin context:

ft interface vlan 294

ip address 192.168.54.73 255.255.255.252

peer ip address 192.168.54.74 255.255.255.252

no shutdown

ft peer 1

heartbeat interval 300

heartbeat count 10

ft-interface vlan 294

Kind Regards

Cathy

sachinga.hcl · ‎05-12-2009

HI ,

Once you've configured redundancy on the ACEs, there's an active one, and a stand-by one. Ok, this is simple. However, there are some times configuration synch fails, and here's what I observed.

Once redundancy is configured this way :

ACTIVE ACE

ft group 1

peer 1

priority 200

peer priority 101

associate-context CONTEXT1

inservice

STANDBY ACE

ft group 1

peer 1

priority 101

peer priority 200

associate-context CONTEXT1

inservice

SO my questionn to you do you have change the peer priority to higher and priority to be lower on the standby ace , or have you copied the config same on both the active as well as standby, please confirm me.

As usual, the active one has the highest priority. Now I want this redundancy to be HOT, i.e. sessions remain up during a switchover as they are mantained in sync by the peers.

Typing a show ft group det on the master ACE you could (as I did) see two types of redundancy :

Peer State : FSM_FT_STATE_STANDBY_HOT

or

Peer State : FSM_FT_STATE_STANDBY_COLD

Cold standby state means that sessions during the switchover will be dropped, and that, for some reason, configuration sync failed, so configurations are not even equal between the two peers, and further changes on the master will not be sent to the slave.

Typical reasons for configurations' sync to fail are :

A scripted probe needs its script file on the ACE's disk0:,

the standby ACE may not have this file on his disk0:

Interfaces are not configured the same way (missing some interface vlan? as told above by cathy)

Svcl groups on the Catalysts hosting the ACE may not pass the same vlans to the two peers.

However, if you made one of these mistakes, as I did, you have your standby ACE in COLD standby state, what to do now ?

Even copying manually the configuration on the second ACE, it will never switch by itself in HOT standby state.

The solution is quite easy :

Solve all the issues that caused the configuration sync to fail (see above).

On the standby ACE, switch off and then on (rapidly) the ft group of the context :

ACE-02/Admin#conf t

ACE-02/Admin(config)#ft group 1

ACE-02/Admin(config-ft-group)#no inservice

ACE-02/Admin(config-ft-group)#inservice

Now you will see the standby ACE erase all of its configuration and then start back to copy it from the master ACE. At the end, you should see on the master :

FT Group : 1

Configured Status : in-service

Maintenance mode : MAINT_MODE_OFF

My State : FSM_FT_STATE_ACTIVE

My Config Priority : 200

My Net Priority : 200

My Preempt : Enabled

Peer State : FSM_FT_STATE_STANDBY_HOT

Peer Config Priority : 101

Peer Net Priority : 101

Peer Preempt : Enabled

Peer Id : 1

Last State Change time : Fri Aug 3 06:22:17 2007

Running cfg sync enabled : Enabled

Running cfg sync status : Running configuration sync has completed

Startup cfg sync enabled : Enabled

Startup cfg sync status : Startup configuration sync has completed

No. of Contexts : 1

Context Name : CONTEXT1

Context Id : 2

Note : During this process configuration is inhibited even on the master ACE.

Hope it will solve your problem.

If you still face problem please speak back to me.

Please rate if find any useful.

Sachin Garg