best way to bring long failed over unit back in production - asa active/failover
Hello all, we have a couple of asa5520s that are configure in active/failover mode.
A while ago we had an issue with the primary unit so we took it out of production and replaced it. At that time the secondary unit became active.
Needless to say in over a month many changes have been made on the now active unit, but we are now ready to re-introduce the replaced primary unity into the cluster and I'm wondering what the best way to do this is.
All the documentation that I'm reading says:
•If a unit boots and detects a peer already running as active, it becomes the standby unit.
thereby it gets its configuration from the active unit and we should be fine, however, our previous attempt at bringing the resuscitated primary unit into the cluster was a faux pas as it appears that the primary/standy unit started overwriting the current active unit's configuration. I don't know the details, I wasn't present.
So, here are my options as I see them to bring this primary unit back in production, if someone with a good heart would please look them over and let me know what they think the best way to do this is, I would appreciate it immensely...
1) Bring resuscitated primary unit back up (offline); copy the current configuration from secondary/active unit into primary unit. Bring primary unit online;
2) Reconfigure secondary unit as primary, and primary unit as secondary, bring secondary unit back online;
3) Completely break failover; wipe resuscitated unit and reconfigure failover on both;
4) best practice??...
Thanks in advance.
Here's some failover related output (sanitized) from our secondary/active unit:
asa# sh failover
Failover unit Secondary
Failover LAN Interface: cccfailover GigabitEthernet0/3 (Failed - No Switchover)
Unit Poll frequency 1 seconds, holdtime 15 seconds
Interface Poll frequency 5 seconds, holdtime 25 seconds
Interface Policy 1
Monitored Interfaces 7 of 160 maximum
Version: Ours 8.2(5), Mate Unknown
Last Failover at: 17:58:13 PST Dec 17 2013
This host: Secondary - Active
Active time: 3088437 (sec)
slot 0: ASA5520 hw/sw rev (1.1/8.2(5)) status (Up Sys)
Interface outside : Normal (Waiting)
Interface inside : Normal (Waiting)
Interface 1 : Normal (Waiting)
Interface 2 : Normal (Waiting)
Interface 3 : Normal (Waiting)
Interface 4 : Normal (Waiting)
Interface 5 Normal (Waiting)
Interface management : No Link (Not-Monitored)
slot 1: ASA-SSM-20 hw/sw rev (1.0/7.0(2)E3) status (Up/Up)
IPS, 7.0(2)E3, Up
Other host: Primary - Failed
Active time: 0 (sec)
slot 0: empty
Interface outside : Unknown (Waiting)
Interface inside : Unknown (Waiting)
Interface 1 : Unknown (Waiting)
Interface 2 : Unknown (Waiting)
Interface 3 : Unknown (Waiting)
Interface 4 : Unknown (Waiting)
Interface 5 : Unknown (Waiting)
Interface management : Unknown (Not-Monitored)
slot 1: empty
Stateful Failover Logical Update Statistics
Link : cccfailover GigabitEthernet0/3 (Failed)
Stateful Obj xmit xerr rcv rerr
General 0 0 0 0
sys cmd 0 0 0 0
up time 0 0 0 0
RPC services 0 0 0 0
TCP conn 0 0 0 0
UDP conn 0 0 0 0
ARP tbl 0 0 0 0
Xlate_Timeout 0 0 0 0
IPv6 ND tbl 0 0 0 0
VPN IKE upd 0 0 0 0
VPN IPSEC upd 0 0 0 0
VPN CTCP upd 0 0 0 0
VPN SDI upd 0 0 0 0
VPN DHCP upd 0 0 0 0
SIP Session 0 0 0 0
Logical Update Queue Information
Cur Max Total
Recv Q: 0 0 0
Xmit Q: 0 0 0
asa# sh run failover
failover lan unit secondary
failover lan interface cccfailover GigabitEthernet0/3
failover link cccfailover GigabitEthernet0/3
failover interface ip cccfailover 10.10.10.1 255.255.255.0 standby 10.10.10.2
best way to bring long failed over unit back in production - asa
I have not had to do this that many times since frankly I have not had that many ASAs break. Though I have to say that we had both of our VPN ASAs break (not at the same time though) and had to replace both. Each unit never recovered after a boot which makes it seem like they suffer from somekind of manufacturing defect. (Aquired at the same time and seem to be pretty much manufactured at the same time)
But to move into the actual subject,
I have done these replacements about 3 times that I can remember. Each time I have done them the same way.
Clear configuration completely replacement ASA unit
Configure the "failover" configurations
Reboot replacement firewall while connected to the network
I am not sure what has happend in your situation but I have had no problems like yours or wiping the configuration. Only had the Config Sync stop which required another reboot.
To my understanding if there is an Active ASA in an Active/Standy Failover pair then the Active device should not change since the units are pretty much equal with regards to eachother unless they are booted around the same time.
I don't however know how your installation has gone wrong. I have gaps in my knowledge about the different behaviour of Failover pair since they seem to work just fine and have not had to go indepth to troubleshoot any of our setups.
I am wondering for example what is the result if the replacement unit were to go Active before it has time to detect the other unit. Which one will go to Standby when both presume to be Active.
So I am not sure what went wrong with your situation but it has worked for me so far.
Being paranoid as I am with even the most certain/safe changes on networks, I tend to take backups and come up with some plan if some bad should happen.
best way to bring long failed over unit back in production - asa
I personally did the following
Clear the configuration of the unit that is about to be installed into the network (if it has some configurations, I have received devics from Cisco which has still had nondefault configurations on them and config register values have been changed also)
Enable physical interfaces with "no shutdown"
Configure the "failover" configurations that it had before (essentially identical to the other unit except for the "primary" / "secondary" setting
Boot the device while connected to network and the other ASA
Login to the FXOS chassis manager.
Direct your browser to https://hostname/, and log-in using the user-name and password.
Go to Help > About and check the current version:
Check the current version availa...
We have configured the outside and inside Interface with official ipv6 adresses, set a default route on outside Interface to our router, we also have definied a rule , which also gets hits, to permit tcp from inside Interface to any6.
In Syslog I also se...