We have an Active/Standby ASA5540 firewall set-up with the Primary Active unit at our head office site (Site A) and the Secondary Standby unit at our DR site (Site B)
Both sites had their "outside" interfaces directly connected to our ISP (We connect the ASA outside interface to the provider's NTE at each site) This all seemed to work reasonably well - our active traffic would go through Site A and, in the event of a failure with Site A firewall or interface, comms would failover to Site B.
We recently decided to upgrade the bandwidth of our outside links to the ISP. This meant getting completely new circuits installed and new NTEs but we requested that we keep the same IP Addressing for the new circuits (we have a number of VPN connections so didn't want to have to be changing configuration)
So, come time to move to the new circuits, we presumed it would just be a case of changing the interface speed on the ASA interface (from 10 to 100) and moving the cables across from old NTE to new NTE. Meanwhile the ISP would activate the "new" ports on their network switch and shutdown the "old" ports. And this could be carried out relatively quickly to minimise any disruption.
However, this is not how it panned out. It seems that when the ISP activates the new ports, Site B takes over as Active firewall and the Site A firewall has its outside interface marked as "failed" - The ISP had to shutdown the Site B link in order to allow us to pass traffic through the Site A firewall and circuit again. And we are left with the situation where we effectively DON'T have our Active/Standby set-up with automatic failover any longer! We can either have Site A active and passing traffic and Site B marked as "failed" on its outside interface or vice versa.
I don't know too much about the ISP's set-up to be honest but, as far as I'm aware, the ISP connects both the circuits for Site A and Site B to the same network switch in their datacentre and to the same VLAN.
Can anyone suggest what the problem might be and how to resolve it? I'm assuming it has to be something at the ISP end since I don't really understand what else could be necessary from our point of view (i.e. what else would we need to do other than move the cables and configure the new interface speed)? Its as if there is some sort of conflict on the ISP's network switch - I don't know if it is something to do with the way the standby ASA takes over the active ASA IP and MAC address and that somehow gets the ISP network switch in a state of confusion?
Does anyone have any ideas/suggestions? Naturally we are a bit disappointed since we hoped this would be a relatively straightforward task to migrate to our new circuits with increased bandwidth!
Where I've seen ASA interfaces, particularly outside ones, showing as "failed" is where they can't actually communicate with each other. I'm not sure if its ICMP that is required between then, but I've certainly seen similar issues where the two ASA outside cards can't ping between each other.
If you run a ping from ASA "B" to the outside address of ASA "A" does it work? I suspect not, and this is the route cause of your issue. If this is the case, then you'll need to get your ISP involved.
And just as another thought... here's a left field guess.
I reckon your old circuits were layer 2 tails (in the same VLAN) that terminate in your ISPs data centre, again on the same VLAN. This means that all devices in the same VLAN can always communicate with each other.
I reckon your new circuits are layer 3 tails, and only one will be routed over at any given time (the current active circuit). This would explain why the "standby" ASA - whichever one it is - always shows its outside card as failed.
Would explain the exact problem you are seeing.
As I say, bit left field, but I reckon there is logic there...
Thanks Barry - some very helpful suggestions, your 2nd one in particular definitely sounds like a strong possibility? Will try to find out more and will update and will let you know if we get any closer to resolving the issue or not...
Well, a quick update. We still haven't got this working successfully.
The ISP have confirmed that the new circuits ARE layer 2 so seems that Barry's earlier suggestion (good though it was!) can't be the cause.
The ISP tried some manipulation of their switch(es) spanning tree set-up but to no avail - we can still only have one circuit active while the other one is marked as failed, Can't ping between the outside interfaces (allowed ICMP first so should have got a response if all was in order!)
I can't see how the issue can be anything other than a switching issue in the ISP's network but, so far, they are at a loss to explain what the problem could be and we are left without automatic failover of our new circuits. The ISP are going to continue to investigate offline but, if anyone has any suggestions or has seen similar in the past then further advice would certainly be appreciated. Thanks.
OK, look if your ISP did a change and it is not working and you are sure of this then why review the ASA. If the unit is at standby at this moment and the only interface that is affected is the outside then ISP ISP ISP.
Well, I'm fairly certain the issue is with the ISP but
a) there is no harm (in fact, some might consider it good practise) to ensure all other bases are covered - just in case
b) it's entirely possible that someone out there in the vast Cisco networking world has come across the same sort of situations, particularly those who work for ISPs with similar customer set-up (or customers with this set-up who have had similar problems with their ISP!), and can give pointers as to how to resolve it - even if that is simply evidence to go back to beat up the ISP with. (Barry's suggestions above were very helpful indeed, for example, even if they may not ultimately have been the cause)
c) even if the problem is ultimately with the ISP, appreciating the dependencies etc can only help to gain a better understanding of the ASA devices themselves which is surely an aim of any technical forum?
I understand what you are saying and we are always happy to help but when the equipment that affects connectivity is not manageable that is where support forums or TAC case can't help. I would suggest calling the ISP and getting this escalated.
You say that you have no connectivity between the ASAs "outside" interfaces? Does your ISP have HSRP doing the gateway redudancy on their side? Can they confirm its ok?
A very easy thing to confirm the complete connectivity would be to ping the "standby" IP address from the Active unit and then issue "show arp | inc outside" (or replace the outside with the actual name of your external interface)
If you can't see the "standby" IP address in the "show arp" output that means even the ARP isnt working between your sites. At this point it should be up to your ISP to check where the traffic stops.
If you can see the "standby" IP address in the Active units ARP then I am not sure what the problem is.
I think the Failover operation has its own "debug" command which is "debug fover" in addition to multiple different parameters. I am not sure how much output it generates but I would use the additional options after the "debug fover" if I were to use debug to help.
You should probably even be able to configure "capture" on your ASA before you do any checking. You could capture traffic between the primary and standby IP address of the interface and see if anything is actually happening. I guess you can even go as far to capture the ARP messages and see if there is anything visible.
Hi Jouni, thanks for the good advice and suggestions - very much appreciated.
I don't know whether the ISP have HSRP doing the gateway/redundancy on their side but I don't think so. I can try to confirm but getting information out of them on their set-up is often difficult, although we continue to pursue them on this.
I am unable to ping the standby IP address from the Active unit (I allowed ICMP so the firewalls themselves were definitely not blocking it) Indeed I can't ping the Standby IP address from anywhere.
However, show arp | inc outside DOES show the "standby" IP address in the output so ARP seems to be working at least?
But if I can see the standby IP address in the Active unit's ARP table but can't seem to otherwise ping/communicate with the Standby unit over the outside interface then what could the problem be?