I have 2 PIX 515 (running 5.2(3)) setup for stateful redundancy (serial cable + ethernet4 crossover). I've had on occasion the following problem: the ACTive status switches between the two boxes, but communications thru the firewall stop to work.
I've verified and the configs are identical (both via forced "write standby" and manual checks on the console port of the standby unit.). The 2 boxes do see each other (I can ping the failover IP from the active unit). But when the active status gets transfered to the second unit, all communications cease. We've also waited enough for STP... (the two units are on two switchs trunked togheter) Any ideas on what to test to solve this?
Another thing that could or not be related - should I be able to SSH/telnet to the standby unit ? I can ping the failover IP from our management network and there's "ssh whole_ip_subnet" in the config, but accessing the standby unit is a no-go. Is this intended behaviour ?
THe pix does not synchronize RSA keys. To ssh to the secondary Pix, you'll need to login into it, generate a key, and use [ca save all]. Of course, the secondary will complain when you enter config mode, but it's okay. Simply exit out and use the [write standby] again from the primary.
When failover occurs and traffic isn't passing, what does [show fail] give you?
The code you're running is over 2 years old. The failover features of the Pix have been one of the most problematic. At least upgrade to the GD version in your train 5.2(9). Personally, I don't like to touch anything below 6.x.
I had observed this problem before where the firewalls are on two different switches. Remember that the MAC address of the forwarding pix always stays the same regardless of which is Active. Switches learn and cache this MAC. If the firewalls failover but the port of the former Active firewall is still up, the switch has no reason to purge and relearn the MAC until the aging timer expires. This varies across platforms but is 300 seconds for the Catalyst. Try logging into your switches and clearing the CAM when the problem occurs. CatOS would use "clear cam dynamic" and IOS would use "clear mac-address-table". This will explain why traffic passes but not why they fail back and forth.
Do the switch ports connecting the switches and the ports for the firewall show error free? All should be locked down on both ends to full-duplex. This is a frequent source of errors. Also, enable portfast for the switch ports connected to the Pixs.
Indeed a key was never generated on the standby node. I had one of those 'duh' moments reading your reply ;)
My main two problems for troubleshooting this are I was unable to access the box (suspecting the MAC problem is to blame and even if it had been fixed before ssh wouldn't have been available) and the urgency to bring everything back up - the cluster protects critical shared environments. By the time I could drive to the datacenter and log on the console the logs had long scrolled by.
The surrounding switches have been reconfigured to forward all frame of the active mac to both ports at all time. If the problem happens again I will make sure to log in the switches too to inspect the mac caches.
Following the problems the surrounding ports have been verified and everything looks good - forced 100/full and no error counters.
We are planning an upgrade to 6.x pretty soon. Would totally disconnecting the failover unit, upgrading it offline, then swapping the primary unit with the failover unit work ? Basically I want to know if the failover unit can run by itself even with a failover license? This would enable us to minimize downtime and have a quick rollback solution to make everyone more comfortable with the process...
The failover pix will run by itself if you leave the cable connected. If you remove the cable and the firewall is rebooted, the firewall will "randomly" reboot itself to prevent customers from trying to use the secondary as a standalone firewall. This warning message can be seen at boot time if you're on the console of the secondary. Try it and see....
Table of ContentsIntroductionVersion HistoryPossible Future
UpdatesDocuments PurposeNAT Operation in ASA 8.3+ SectionsRule Types
Network Object NATTwice NAT / Manual NATRule Types used per SectionNAT
Types used with Twice NAT / Manual NAT and Network Obje...
Table of Contents Introduction:This document describes details on how
NAT-T works. Background: ESP encrypts all critical information,
encapsulating the entire inner TCP/UDP datagram within an ESP header.
ESP is an IP protocol in the same sense that TCP an...