I have done failover for firewalls umpteen number of times but yesterday it failed for some reason.
I had replaced the failed primary unit with a fresh one and i had expected that it will detect the secondary unit as active and try to begin config replication from it but rather it wiped off the secondary unit's config. I dont think that i faulted in the sequence but let me share with you what i did:
1. Put the four or five lines of failover configuration (except the failover command) and did a no shut on the failover interface (management0/0)
2. Ran the failover command
Instead of getting the config from the active unit, it started forcing the configs to the other unit. To restore, i had to reload the active unit to restore its config. After that i reloaded the fresh unit and now the failover happened as expected.
I think that i should forced a reload of the new unit before trying to establish failover.
Has anyone tried this in a fail-proof way during production hours? if yes, can you please share with me the steps?
I did not ask for downtime because i was confident but i resulted in bringing down the ASA for 5 minutes because of the unexpected failover action.
Thanks a lot
So, you haven't given me everything that went on The sequence of events is very critical.
What you saw is expected also. Just verified that as well.
7. I then said no failover to disable failover on the new primary unit.
8. I then went to secondary active unit and said failover as failover was disabled
9. I then went back to primary unit and said failover ---> you did this too quickly
10. This is where blank config replication started !!
They above is expected also. You had issued "fail" in the primary unit during the negotiation process. Both untis come up during the negotiation process the primary unit becomes active. You should have waited until the secondary unit gave you "No response from Mate" - verified its "sh fail" output and then enabled "fail" on the primary things would have worked fine. I have verified this as well.
The key here is to make sure to issue "sh fail" on the secondary unit make sure it shows this unit "Active" and not "Negotiation".
Also, there is no need to reload the secondary when it is by itself. Understand that when the secondary unit becomes active, it assumes the primary unit's mac address and ip address and would proxy arp for it. Now, if you reload it it would continue to use its own mac as the active mac. Then you will introduce a brand new primary with an all new mac address.
The take away from this is when ever you replace a unit whether it is primary or secondary make sure the stitting unit's "sh fail" is what you expect which is "this unit active" and "other unit failed" before you introduce the other unit and enable failover on it.
Sorry, you went through this but it worked per design and expected.