PIX 6.3.3 Failover problem - been working for months

Unanswered Question
Jan 27th, 2009
User Badges:

I have a pair of 525 PIX's running 6.3.3 (old I know, downtime preventing upgarde/hardware swap out) that just decided to start throwing failover errors.


I saw this in the logs at the time of the failure:

Jan 23 18:26:34 elm-pix-2 Jan 23 2009 18:26:34: %PIX-1-105005: (Secondary) Lost Failover communications with mate on interface 1

Jan 23 18:26:34 elm-pix-2 Jan 23 2009 18:26:34: %PIX-1-105008: (Secondary) Testing Interface 1

Jan 23 18:26:45 elm-pix-1 Jan 23 2009 18:26:45: %PIX-1-103005: (Primary) Other firewall reporting failure.


Then after getting to the unit and unplugging and reconnecting the failover cable, I saw this:

Jan 27 09:20:47 elm-pix-2 Jan 27 2009 09:20:47: %PIX-1-101004: (Secondary) Failover cable not connected (other unit)

Jan 27 09:20:51 elm-pix-1 Jan 27 2009 09:20:51: %PIX-1-101003: (Secondary) Failover cable not connected (this unit)

Jan 27 09:21:17 elm-pix-2 Jan 27 2009 09:21:17: %PIX-1-101001: (Secondary) Failover cable OK.

Jan 27 09:21:21 elm-pix-1 Jan 27 2009 09:21:21: %PIX-1-101001: (Primary) Failover cable OK.

Jan 27 09:21:37 elm-pix-1 Jan 27 2009 09:21:37: %PIX-1-709003: (Primary) Beginning configuration replication: Send to mate.

Jan 27 09:21:51 elm-pix-2 Jan 27 2009 09:21:51: %PIX-1-709006: (Secondary) End Configuration Replication (STB)

Jan 27 09:21:51 elm-pix-1 Jan 27 2009 09:21:51: %PIX-1-709004: (Primary) End Configuration Replication (ACT)

Jan 27 09:23:37 elm-pix-1 Jan 27 2009 09:23:37: %PIX-1-709003: (Primary) Beginning configuration replication: Send to mate.

Jan 27 09:23:51 elm-pix-2 Jan 27 2009 09:23:51: %PIX-1-709006: (Secondary) End Configuration Replication (STB)

Jan 27 09:23:51 elm-pix-1 Jan 27 2009 09:23:51: %PIX-1-709004: (Primary) End Configuration Replication (ACT)


So I can then do a wr standby on the primary BUT I DO NOT see the 'starting to sync', and I get this from the 'sh failover'......failover config below as well:

ELM-PIX525-1(config)# sh fail

Failover On

Cable status: Normal

Reconnect timeout 0:00:00

Poll frequency 15 seconds

failover replication http

Last Failover at: 09:14:49 CST Fri Mar 28 2008

This host: Primary - Active

Active time: 26707815 (sec)

Interface outside (65.166.254.2): Normal

Interface inside (10.200.1.249): Normal

Interface EDMZ1 (172.30.1.1): Normal

Interface EDMZ2 (0.0.0.0): Link Down (Shutdown)

Interface MGT (10.200.1.125): Link Down (Waiting)

Interface intf5 (172.27.0.1): Normal

Other host: Secondary - Standby (Failed)

Active time: 0 (sec)

Interface outside (65.166.254.3): Normal

Interface inside (10.200.1.250): Normal

Interface EDMZ1 (172.30.1.3): Normal

Interface EDMZ2 (172.31.1.3): Link Down (Shutdown)

Interface MGT (10.200.1.126): Link Down (Waiting)

Interface intf5 (172.27.0.2): Normal


failover

failover timeout 0:00:00

failover poll 15

failover replication http

failover ip address outside xx.xx.254.3

failover ip address inside 10.200.1.250

failover ip address EDMZ1 172.30.1.3

failover ip address EDMZ2 172.31.1.3

failover ip address MGT 10.200.1.126

failover ip address intf5 172.27.0.2

failover link intf5





Thanks for any help....


Chris Serafin

[email protected]

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
eddie.mitchell@... Thu, 01/29/2009 - 09:57
User Badges:
  • Silver, 250 points or more

If the configuration hasn't changed at all, it might be a hardware issue with one of the interfaces on the secondary unit. If you do a 'sh int' on the secondary unit, do you see a large number of errors on any of the interfaces?


Is the state interface (intf5) directly connected on both units or is it going through a switch?


Hope this helps.

Actions

This Discussion