Solved: ASA HA Pair

sbertsch · ‎10-20-2009

I've been looking at doing HA on ASA for the first time, and have found a few confusing bits in documentation.

One thing I've come across a few times, is the statement that the failover port between firewalls should be cabled to a switch, rather than simply a cross-over between firewalls. One document I read didn't back this up with any technical reason for doing so. Another document claims that this is so each firewall in the pair doesn't assume that its own interface has failed. Fine, but how does this actually impact the failover logic? Does this count as a failed interface the same as other monitored interfaces, or is there some other behavior? I can certainly cable a switch in-between, but what happens when the switch fails? If I'm buying redundant components all around, HSRP hand-off on outside, dual-switches on inside, etc., then it would seem a little silly if I had this one link in the middle that did something undesirable.

The next confusing bit is the suggestion that the stateful failover link should be of sufficient capacity to carry all other interfaces combined. At first glance, this seemed a bit overkill, since I would figure that they would simply add a hook into the state-table updater to send some proprietary state-update message across the link, which would result in a trickle of traffic compared to the aggregate flowing through the ASA, but then I realized that, for instance, to do TCP sequence randomization, it would have to keep track of TCP sessions and update on every TCP SYN it receives. In that case, is it just tagging first x bytes of frames with a header indicating interface and shooting out the stateful interface? My main concern is that I'll be using the 5510 w/sec plus license. This gives me 2x10/100/1000, 2x crippled 10/100/1000 that you can only set to 10/100, and the 10/100 for management. If I have gig on both inside and outside, then all I have left for the stateful failover link is 100, or in other words, that the gig interfaces security plus gives you may as well not be there if you plan on doing HA.

Panos Kampanakis · ‎10-21-2009

For the first question, there are HELLOs send over the failover interface too. So if you have your interface go down even though it wasn't you but the directly connected mate, you don't know that and you assume that you have lost an interface. In other words if the ASAs are directly connected over the failover link whichever loses and interface doesn't matter. If one of them loses it then both will think they have lost it. If there is a switch that doesn't happen. Now if the switch dies, then yes both lose their interface, but that is not a misunderstanding any more, it is true since the hardware in the middle is dead. Failover will still work in both scenarios though, because the units even if they lose the failover link they do an arp test and if they get a response for another mate interface they will not change failover roles. The recommendation is that because of the timeouts and the sequence of events in case of a failover interface going down, a failover when the units are directly connected will take more time compared to a failover when a switch is in the middle. I hope it makes sense.

As for the state link capacity it doesn't literally need to be the aggregate of the others. If it was that, then in an ASA5580 we would need 100B interfaces for failover, which we don't. Your train of though is correct though. Connection state information is passed between the units. Those along with the VPN use information, the xlate table etc add up to be quite some amount of data. In the case of the 5580 we do not recommend to use the management interfaces for failover and the reason is that the amount of traffic can be so high that the state updates will no fit in the management interface pipe. The same goes for the other ASAs. The rule of thumb is to treat the state interface as a high optimized traffic interface as it will pass state and unit information that refer to the whole box and not just one of its interfaces.

I hope it helps.

PK

View solution in original post

Panos Kampanakis · ‎10-21-2009

For the first question, there are HELLOs send over the failover interface too. So if you have your interface go down even though it wasn't you but the directly connected mate, you don't know that and you assume that you have lost an interface. In other words if the ASAs are directly connected over the failover link whichever loses and interface doesn't matter. If one of them loses it then both will think they have lost it. If there is a switch that doesn't happen. Now if the switch dies, then yes both lose their interface, but that is not a misunderstanding any more, it is true since the hardware in the middle is dead. Failover will still work in both scenarios though, because the units even if they lose the failover link they do an arp test and if they get a response for another mate interface they will not change failover roles. The recommendation is that because of the timeouts and the sequence of events in case of a failover interface going down, a failover when the units are directly connected will take more time compared to a failover when a switch is in the middle. I hope it makes sense.

As for the state link capacity it doesn't literally need to be the aggregate of the others. If it was that, then in an ASA5580 we would need 100B interfaces for failover, which we don't. Your train of though is correct though. Connection state information is passed between the units. Those along with the VPN use information, the xlate table etc add up to be quite some amount of data. In the case of the 5580 we do not recommend to use the management interfaces for failover and the reason is that the amount of traffic can be so high that the state updates will no fit in the management interface pipe. The same goes for the other ASAs. The rule of thumb is to treat the state interface as a high optimized traffic interface as it will pass state and unit information that refer to the whole box and not just one of its interfaces.

I hope it helps.

PK

sbertsch · ‎10-21-2009

Thanks, PK! That's basically what I was assuming, but just needed a sanity check.

I did find in testing that interface tracking details appear to be sent over the failover interface only.

I.e., if failover interface has failed, and a connected interface on primary subsequently fails, then standby is not promoted to active.

I'm now looking at using a redundant interface w/two 10/100 links for both state and failover links.

E.g.

asa active e0/2 -> sw1(vlan x)<- e0/2 asa standby

asa active e0/3 -> sw2(vlan x) <- e0/3 asa standby

sw1sw2

int redundant 1

member-interface eth0/2

member-interface eth0/3

Panos Kampanakis · ‎10-21-2009

OK, so you are thinking of using redundant interface for the failover link. That is a good idea if you can afford it as it keeps you safe from a "lost failover link, units can't communicate" scenario.

PK