FWSM Failover takes more than 15 seconds !

Unanswered Question
Aug 21st, 2008

Hi,

I know this question has been posted last year already, however without a clear answer.


I have two FWSM (version 3.2(2) installed on two different switches. The FWSM are configured with two contexts ,running in active/standby mode.


I use two different VLANs for the failover and synchronization. manual switchover (with the no failover active command) works fine, TCP sessions are taken over, no problem.


However, if the primary unit fails (power off), the secondary unit take up to 15 seconds to take over. I tuned the unit poll frequency to 500 msec with a holdtime of 3 seconds.


What can be wrong ?


Here is the config on the primary unit :



SYSTEM# sh run failover


failover

failover lan unit primary

failover preempt 60

failover lan interface FAILOV Vlan10

failover polltime unit msec 500 holdtime 3

failover replication http

failover link STSYNC Vlan11

failover interface ip FAILOV 192.168.10.1 255.255.255.0 standby 192.168.10.2



And the replicated version on the secondary unit :


failover

failover preempt 60

failover lan interface FAILOV Vlan10

failover polltime unit msec 500 holdtime 3

failover replication http

failover link STSYNC Vlan11

failover interface ip FAILOV 192.168.10.1 255.255.255.0 standby 192.168.10.2

failover interface ip STSYNC 192.168.11.1 255.255.255.0 standby 192.168.11.2


Thank you for any help


Yves Haemmerli

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.

Yes - sorry, checking the wrong device config!


Below is my fwsm failover config, it fails over in 3 seconds:-


failover

failover lan unit primary

failover preempt 300

failover lan interface failover Vlanxxx

failover polltime unit 1 holdtime 3

failover polltime interface 3

failover interface-policy 1

failover replication http

failover link failover Vlanxxx

failover interface ip failover x.x.x.x 255.255.255.252 standby 1x.x.x.x


yves.haemmerli Thu, 08/21/2008 - 07:38

Andrew,


I added the interface-policy 1 statement and the failover polltime interface 3 statement to my config, but the result is the same...


So my config is now :


SYSTEM# sh run failover

failover

failover lan unit primary

failover preempt 60

failover lan interface FAILOV Vlan10

failover polltime unit msec 500 holdtime 3

failover polltime interface 3

failover interface-policy 1

failover replication http

failover link STSYNC Vlan11

failover interface ip FAILOV 192.168.10.1 255.255.255.0 standby 192.168.10.2

failover interface ip STSYNC 192.168.11.1 255.255.255.0 standby 192.168.11.2



The only difference I have is that you maybe use the same VLAN for both failover and sync ? but anyway, it shouldn't have any impact


Yves





yves.haemmerli Thu, 08/21/2008 - 07:59

Andrew,


I did the change, but unfortunately there is no change on the behaviour...It is really strange.


Do you have the stement "firewall autostate" statement on your 6500 switch config ?


Yves

yves.haemmerli Thu, 08/21/2008 - 08:12

Andrew,


Here is the command show failover output on the primary and secondary units :


On the primary :



SYSTEM# sh failover

Failover On

Failover unit Primary

Failover LAN Interface: FAILOV Vlan 10 (up)

Unit Poll frequency 1 seconds, holdtime 3 seconds

Interface Poll frequency 3 seconds

Interface Policy 1

Monitored Interfaces 0 of 250 maximum

failover replication http

Config sync: active

Version: Ours 3.2(2), Mate 3.2(2)

Last Failover at: 15:53:01 UTC Aug 21 2008

This host: Primary - Active

Active time: 545 (sec)

ADMIN Interface FWMGT (10.56.2.12): Normal (Not-Monitored)

CH01FW01 Interface V-300-P (10.56.5.1): Normal (Not-Monitored)

CH01FW01 Interface V-400-P (10.56.9.1): Normal (Not-Monitored)

Other host: Secondary - Standby Ready

Active time: 596 (sec)

ADMIN Interface FWMGT (10.56.2.13): Normal (Not-Monitored)

CH01FW01 Interface V-300-P (10.56.5.2): Normal (Not-Monitored)

CH01FW01 Interface V-400-P (10.56.9.2): Normal (Not-Monitored)


Stateful Failover Logical Update Statistics

Link : STSYNC Vlan 11 (up)

Stateful Obj xmit xerr rcv rerr

General 228 0 87 0

sys cmd 88 0 88 0

up time 0 0 0 0

RPC services 0 0 0 0

TCP conn 0 0 0 0

UDP conn 0 0 0 0

ARP tbl 141 0 0 0

Xlate_Timeout 0 0 0 0

AAA tbl 0 0 0 0

DACL 0 0 0 0


Logical Update Queue Information

Cur Max Total

Recv Q: 0 1 612

Xmit Q: 0 0 229



An here on the secondary :


SYSTEM# sh failover

Failover On

Failover unit Secondary

Failover LAN Interface: FAILOV Vlan 10 (up)

Unit Poll frequency 1 seconds, holdtime 3 seconds

Interface Poll frequency 3 seconds

Interface Policy 1

Monitored Interfaces 0 of 250 maximum

failover replication http

Config sync: active

Version: Ours 3.2(2), Mate 3.2(2)

Last Failover at: 15:53:39 UTC Aug 21 2008

This host: Secondary - Standby Ready

Active time: 596 (sec)

ADMIN Interface FWMGT (10.56.2.13): Normal (Not-Monitored)

CH01FW01 Interface V-300-P (10.56.5.2): Normal (Not-Monitored)

CH01FW01 Interface V-400-P (10.56.9.2): Normal (Not-Monitored)

Other host: Primary - Active

Active time: 663 (sec)

ADMIN Interface FWMGT (10.56.2.12): Normal (Not-Monitored)

CH01FW01 Interface V-300-P (10.56.5.1): Normal (Not-Monitored)

CH01FW01 Interface V-400-P (10.56.9.1): Normal (Not-Monitored)


Stateful Failover Logical Update Statistics

Link : STSYNC Vlan 11 (up)

Stateful Obj xmit xerr rcv rerr

General 2520 0 7981 0

sys cmd 2124 0 2117 0

up time 0 0 0 0

RPC services 0 0 0 0

TCP conn 24 0 20 0

UDP conn 0 0 4 0

ARP tbl 373 0 5841 0

Xlate_Timeout 0 0 0 0

AAA tbl 0 0 0 0

DACL 0 0 0 0


Logical Update Queue Information

Cur Max Total

Recv Q: 0 2 25275


yves.haemmerli Fri, 08/22/2008 - 01:21

Hi Andrew,

I really appreciate your help, as I am a bit stuck with this issue in an important data center...


In the troubleshooting document, I read : "Note: Do not configure an IP address for the failover link or for the state link (if you are going to use Stateful

Failover)."


But in the sample configuration, they configure an IP address ??? I also configured an IP address on the failover VLAN and on the Sync VLAN (remember, I use two diferent VLANs, as recommended by Cisco).


For your information, here is my SYSTEM configuration :


SYSTEM# sh run

: Saved

:

FWSM Version 3.2(2)

!

resource acl-partition 12

hostname SYSTEM

enable password xxx

!

interface Vlan10

description LAN Failover Interface

!

interface Vlan11

description STATE Failover Interface

!

interface Vlan91

description *** Network Management VLAN ***

!

interface Vlan300

description *** Server Farms VLAN ***

!

interface Vlan400

description *** Critical Servers VLAN ***

!

passwd /qNhaw.ZtG3q1e1B encrypted

class default

limit-resource IPSec 5

limit-resource Mac-addresses 65535

limit-resource ASDM 5

limit-resource SSH 5

limit-resource Telnet 5

limit-resource All 0

!


ftp mode passive

pager lines 24

failover

failover lan unit primary

failover preempt 60

failover lan interface FAILOV Vlan10

failover polltime unit 1 holdtime 3

failover polltime interface 3

failover interface-policy 1

failover replication http

failover link STSYNC Vlan11

failover interface ip FAILOV 192.168.10.1 255.255.255.0 standby 192.168.10.2

failover interface ip STSYNC 192.168.11.1 255.255.255.0 standby 192.168.11.2

no asdm history enable

arp timeout 14400

console timeout 0


admin-context ADMIN

context ADMIN

description *** This is the administration context. It has a unique interface on VLAN 91 ***

allocate-interface Vlan91

config-url disk:/ADMIN.cfg

!


context CH01FW01

description *** This is the Firewall instance between security zones 3 and 4 ***

allocate-interface Vlan300

allocate-interface Vlan400

config-url disk:/CH01FW01.cfg

!


prompt hostname context

Cryptochecksum:xxx

: end





In your environment, do you monitor any interface in addition to unit monitoring ? If yes, what interface do you monitor ? I have basically two contexts in my FWSM : the ADMIN context, with a unique interface on my management vlan and another context for production, with two interfaces (inside and outside).


Yves



yves.haemmerli Fri, 08/22/2008 - 02:13

Andrew,


Do you understand what Cisco means when writing :


"Note: Do not configure an IP address for the failover link or for the state link (if you are going to use Stateful

Failover)." ?


Actually, we both have configured IP addresses on the failover VLAN :


failover interface ip FAILOV 192.168.10.1 255.255.255.0 standby 192.168.10.2

failover interface ip STSYNC 192.168.11.1 255.255.255.0 standby 192.168.11.2



Yves

Yes - it's becuase you do not want the state link IP being transfered in the event of failover?


As below that note - there is this one:-


Note: You do not need to identify the standby address subnet mask. The failover link IP address and MAC address do not change at failover. The active IP address for the failover link always stays with the primary unit, while the standby IP address stays with the secondary unit.

yves.haemmerli Fri, 08/22/2008 - 04:49

Andrew,


i think I come narrow to the problem. With proper debug messages enabled, I can see that the FW02 (the standby) makes several ARP checks on the other firewall interfaces, before declaring FW021 dead...


Here are the messages with comments :



SYSTEM# debug fover ifc

SYSTEM# debug fover cable

SYSTEM# debug fover fmsg

SYSTEM# debug fover sync

SYSTEM debug timestamp 255



1636.170000000: fover_health_monitoring_thread: fover_lan_check() Failover Interface OK

1638.170000000: fover_health_monitoring_thread: fover_lan_check() Failover LAN Check

1638.170000000: fover_health_monitoring_thread: fover_lan_check() Failover Interface OK


FW01 Fails now !


1638.170000000: fover_health_monitoring_thread: fover_luifc_check: ifc rcv fail cnt = 1

1639.170000000: fover_health_monitoring_thread: fover_lan_check() Failover LAN Check

1639.170000000: fover_health_monitoring_thread: fover_lan_check() Failover Interface TEST started

1639.170000000: fover_health_monitoring_thread: send_mate_arp(1) - 192.168.10.1

1639.170000000: fover_health_monitoring_thread: send_mate_arps(50001) - 10.56.2.12

1639.170000000: fover_health_monitoring_thread: send_mate_arps(60001) - 10.56.5.1

1639.170000000: fover_health_monitoring_thread: send_mate_arps(60002) - 10.56.9.1

1639.170000000: fover_health_monitoring_thread: fover_luifc_check: ifc rcv fail cnt = 2

1640.170000000: fover_health_monitoring_thread: fover_lan_check() Failover LAN Check

1640.170000000: fover_health_monitoring_thread: send_mate_arp(1) - 192.168.10.1

1640.170000000: fover_health_monitoring_thread: send_mate_arps(50001) - 10.56.2.12

1640.170000000: fover_health_monitoring_thread: send_mate_arps(60001) - 10.56.5.1

1640.170000000: fover_health_monitoring_thread: send_mate_arps(60002) - 10.56.9.1

1640.170000000: fover_health_monitoring_thread: fover_luifc_check: ifc rcv fail cnt = 3


... 15 seconds


1655.170000000: fover_tx: putit(cmd = b) seqNum 0x8fd, no response, retry=10, busyack=0

1670.170000000: fover_tx: putit(cmd = 9) seqNum 0x8fe, no response, retry=10, busyack=0

1670.170000000: fover_tx: comm_alarm(0, reason 3)

1670.170000000: fover_FSM_thread: Vlan status(DOWN) update Time: 0

1670.170000000: fover_FSM_thread: MAC update Time: 0

1670.180000000: fover_FSM_thread: IP update Time: 0

1670.190000000: fover_FSM_thread: Vlan status(UP) update Time: 0

1671.170000000: fover_health_monitoring_thread: fover_lan_check() Failover LAN Check

1671.170000000: fover_health_monitoring_thread: send_mate_arp(1) - 192.168.10.1

1671.170000000: fover_health_monitoring_thread: send_mate_arps(50001) - 10.56.2.13

1671.170000000: fover_health_monitoring_thread: send_mate_arps(60001) - 10.56.5.2

1671.170000000: fover_health_monitoring_thread: send_mate_arps(60002) - 10.56.9.2

1671.170000000: fover_health_monitoring_thread: fover_lan_check() Failover LAN Check - Timeout

1671.170000000: fover_health_monitoring_thread: fover_luifc_check: skip lu ifc monitoring

Farrukh Haroon Mon, 08/25/2008 - 18:33

You have to monitor the interfaces to make your failover work properly. Use the 'monitor-interface' command for this purpose. The failover link 'does not' need to be put in the monitor-interface command.



Secondly what that Cisco quote means, IF you are 'sharing' the failover and stateful links you don't need to configure the IP address of the stateful link. However if you don't share them, then the need for the IP address on the STATEFUL link also exists.


Using 'default configuration's the FWSM failovers in about 2-3 seconds.


Regards


Farrukh

yves.haemmerli Tue, 09/02/2008 - 03:34

Hi all,


Just to let you know that my FWSM failover problem has been solved after upgrading the code from 3.2(2) to 3.2(6) !


Now, the switchover is executed in 3 second, as expected.


Thank you all for your help and contribution


Yves Haemmerli

Farrukh Haroon Tue, 09/02/2008 - 07:53

Thank you for the update, it will really help all those that lookup this problem in the future.


Regards


Farrukh

Actions

This Discussion