ASA 5510 xlate problem

nhelms · ‎07-06-2011

I'm having an odd issue with my ASA 5510. 3-5 times a week a number of users will complain they can no longer access the internet, they have local network access but are unable to get past the inside interface of my ASA 5510. The only way to restore access is to clear xlate on the ASA, and then everyones' connection is immediately back online. It doesn't impact all users, just a few at any time but it is different users. My ASA is using 8.2.1.

thanks, Nick

andrew.prince · ‎07-06-2011

What license feature do you have?

Sent from Cisco Technical Support iPad App

nhelms · ‎07-06-2011

Ipsec and a 10 SSL VPN license

thanks, nick

mvsheik123 · ‎07-06-2011

If I remember correct, I guess there is some xlate timeout bug in 8.2 for NAT'd connections (may not be for PAT). Check out the bug tool kit for this IOS Version.

hth

MS

Jay Johnston · ‎07-06-2011

Check the output of 'show xlate count' and 'show conn count' to ensure that you aren't hitting the maximum number of allowed connections on the ASA. Issuing a 'clear xlate' would clear any connections associated with a PAT xlate, and therefore mitigate the problem if you are encountering the issue. This problem is most often seen when administrators set the connection timeout to 0:0:0, which causes the total number of connections to slowly grow over time.

Another possibility centers around your ISP blocking certain port ranges of traffic. Do you use an internal DNS server? We've seen a problem that matches your symptoms several times, and it goes like this:

First lets define the issue here, every once in a while, DNS traffic 
will fail due to the ISP blocking specific UDP destination ports on 
their network. In one case we discovered that RoadRunner will blocks the 
following ports on their entire network:

     UPD Destination Port 1026
     UPD Destination Port 1027
     UPD Destination Port 1028
     UPD Destination Port 1029

Because of the way NAT Overloading works (outlined below) there may come 
a time when UDP traffic will leave the customer's network with a source 
port of 1026-1029. This, because of how IP communications functions, 
will cause return traffic to the customer to have a destination port of 
1026-1029. These return packets are being filtered out by the ISP.

To get a feel for how NAT overloading works on the firewall, one must 
first understand what NAT overloading allows you to do. When you 
'overload' and IP address on the Internet, it allows you to have 
multiple computer/hosts on your network access the Internet using one 
single IP (or a small range of IPs).

The firewall, when translating port for NAT overload, splits the 
available ports into three pools:

Low: 0-511
Mid: 512-1023
High: 1024-65535

If a packet inside you network comes into the Firewall destined for the 
Internet, and it source port falls into one of those pool, the PIX will 
translate it to another port in that pool. When the Firewall first 
starts translating addresses, it starts with the lowest port number in 
each pool. That means the first UDP packet sourced internally from a 
high port will get sent on the Internet with a new source port of 1024.

The next UDP high port translation will go out with a source port of 
1025, so on and so forth. This is where the ISP's blocking becomes an 
issue...

Since DNS uses UDP, a DNS request will cause the firewall to create a UDP 
port translation. When your DNS server happens to get translated to port 
1026-1029, you get stuck in this failed position because the translation 
never times out.

The reason for the translation not timing out (and getting in this 
'stuck' state) is two fold. When a UDP packet going from inside to 
outside passes through the firewall, the firewall checks to see if there 
already is a translation for that sour IP/port and destination IP/port. 
If there is a translation already, the firewall will use that instead of 
using up another port and the packet will leave with the already in use 
source port.

If there is no translation, then the firewall will build a new 
translation and pick the next port in the pool. Since the DNS server is 
making requests within the udp timeout range (2 minutes) the UDP 
connection never times out on the firewall. If the connection never 
times out, then the translation will not timeout either. This factor is 
compounded by the behavior of the Windows DNS service. Because the DNS 
service, by design, sends out using the same UDP source port to the same 
destination IP (ISP DNS Server) and destination port (53) for every 
query (does not change until the DNS service is restarted) the 
translation is re-used over and over again until both the UDP and 
tranlsation timers expire.

To fix this issue, you can force a Windows DNS server Service to use a 
pre-specified source port. By forcing the source port to a low number 
(0-511) the port will leave the Firewall in that range of low ports. It 
will then not hit the issue where you send out on 1026-1029 and the ISP 
blocks the return DNS answer.

Why does this happen to some networks and not others? Well it hinges on 
whether or not the DNS server just do happens to get that UDP port as it 
leaves the firewall. If when the firewall boots up, there may be be some 
UDP traffic that takes up those ports before the server starts making 
traffic over the Internet.

If you have very little UDP traffic, you might hit the issue because the 
server eventually hits those ports. If you have a lot of UDP traffic you 
might hit the issue because you use all the UDP port (up to 65535) and 
wrap back around to 1026-9.

To test if this is the problem you're facing, issue 'show xlate debug' (or 'show xlate' in ver 8.3+) and look for the translations that your internal DNS server is using, and take a note of the global port that the DNS connection is translated to by the firewall. You can then apply a capture to the outside interface of the firewall to watch traffic sourced from that port, and verify if responses are coming back (

https://supportforums.cisco.com/docs/DOC-1222,

https://supportforums.cisco.com/docs/DOC-12632).

You then clear that xlate selectively on the ASA and see if the problem is mitigated ('clear xlate port x'). That will confirm if you are hitting this issue or not.

If you are hitting this issue, then you can specify a bogus port translation in the configuration that should "use up" the port that the ISP blocks, and therefore mitigate the problem from occurring again. In the below example, UDP port 1026 is reserved, and won't be used for dynamic PAT xlate allocation...we use a bogus inside ip of 10.123.123.123 to ensure that the ASA won't use this xlate with an actual connection built throught the firewall:

static (outside,inside) udp interface 1026 10.123.123.123 12345 netmask 255.255.255.255
nat (inside) 1 0.0.0.0 0.0.0.0
global (outside) 1 interface

nhelms · ‎07-06-2011

Hey Jay,

When I issue a sh xlate count I get 57 current, 72 max connections. When I issue the show connect count I get 638 current, 1357 maximum connections. I checked and the timeout for xlate and connect are the defaults, should they be adjusted? We typically have 10 remote VPN users and 35-40 internal network users connected during the day.

thanks, nick

Jay Johnston · ‎07-07-2011

Nick,

Don't adjust the timeout settings; the 'show conn count' proves that the ASA isn't hitting the maximum supported connections, so this can't be your problem.

I suggest you run through the troubleshooting steps I suggested above if the problem occurs again.

Sincerely,

Jay