This is my first post so I hope I explain the problem suitably and can get someone interested. =)
I seem to be having an issue where certain very packets are being dropped/lost by my office router. The reproducible situation is, when I attempt a DNS zone transfer from my linux bind DNS server (A.A.A.A) to any server on my network behind NAT (Y.Y.Y.Y) the first packet (Seq 1) of the response is lost. The client making the query asks for first packet (Seq 1) to be resent, and the DNS server attempts to resend it repeatedly, but those are lost too.
Hosted DNS server (A.A.A.A) MTU 1500
Office router Cisco 1811 (Y.Y.Y.Y) MTU 1500
Any workstation querying the DNS
I have turned off access-lists on both FE0 and Vlan1.
I've turned off TCP Segmentation Offload on the DNS server so I can see the frames leaving it accurately in Wireshark. They are all 1514 bytes or less. On a workstation querying the DNS server I also have Wireshark, and I can see that the first frame of the response and retransmits never appear. The first frame to appear is Seq 1449. Here's what it looks like on A.A.A.A (the packets in red never reach the workstation):
No.Time Source Destination Protocol Info
1 0.000000 Y.Y.Y.Y A.A.A.A TCP 58533 > domain [SYN] Seq=0 Win=5840 Len=0 MSS=1460 SACK_PERM=1 TSV=1167384958 TSER=0 WS=4
2 0.000070 A.A.A.A Y.Y.Y.Y TCP domain > 58533 [SYN, ACK] Seq=0 Ack=1 Win=14480 Len=0 MSS=1460 SACK_PERM=1 TSV=2654576189 TSER=1167384958 WS=6
3 0.036195 Y.Y.Y.Y A.A.A.A TCP 58533 > domain [ACK] Seq=1 Ack=1 Win=5840 Len=0 TSV=1167384962 TSER=2654576189
4 0.036233 Y.Y.Y.Y A.A.A.A DNS Standard query AXFR domain.com
5 0.036248 A.A.A.A Y.Y.Y.Y TCP domain > 58533 [ACK] Seq=1 Ack=31 Win=14528 Len=0 TSV=2654576199 TSER=1167384962
6 0.037076 A.A.A.A Y.Y.Y.Y DNS Standard query response SOA ...
7 0.037089 A.A.A.A Y.Y.Y.Y TCP [Continuation to #22] domain > 58533 [ACK] Seq=1449 Ack=31 Win=14528 Len=1448 TSV=2654576199 TSER=1167384962
8 0.037098 A.A.A.A Y.Y.Y.Y TCP [Continuation to #22] domain > 58533 [PSH, ACK] Seq=2897 Ack=31 Win=14528 Len=384 TSV=2654576199 TSER=1167384962
9 0.078684 Y.Y.Y.Y A.A.A.A TCP [TCP Dup ACK 4#1] 58533 > domain [ACK] Seq=31 Ack=1 Win=5840 Len=0 TSV=1167384965 TSER=2654576199 SLE=1449 SRE=2897
10 0.572180 A.A.A.A Y.Y.Y.Y DNS [TCP Retransmission] Standard query response SOA ...
11 1.108164 A.A.A.A Y.Y.Y.Y DNS [TCP Retransmission] Standard query response SOA ...
12 1.644159 A.A.A.A Y.Y.Y.Y DNS [TCP Retransmission] Standard query response SOA ... (and so on)
All I know is it's disappearing somewhere between the DNS server and the workstation. It works fine to my home though, so I suspect either an issue in my Cisco configuration or the router itself. The ISP in the office is Virgin Media in the UK, which is the same provider I have at home (so I don't suspect a problem there, yet can't rule it out).
I turned on packet debugging in the router for all packets to/from A.A.A.A TCP port 53. I don't know how to make the best use of Cisco debugging yet, but sure enough the packet seems to be missing. But there's also this debug message that looks very odd:
003603: Dec 16 16:22:31.914: pak 862C1BB0 consumed in input feature , packet consumed, NAT Outside(53), rtype 0, forus FALSE, sendself FALSE, mtu 0
Does anyone have any suggestions? It's an odd one! I don't know what the debug message means.
Thank you for your interest and fast response.
There is no zone-based firewall set up, and I have removed the access-list on FE0. No ip inspect rules are defined.
I'm happy to post my configuration if it would be useful.
A restart with no configuration change has fixed it. Anyone know what the "packet consumed" message meant?
This is happening again with exactly the same symptoms. The problem showed itself again after a few days without any configuration changes taking place.
Please, does anyone have any suggestions?
Alternatively, how does one go about getting support from Cisco? The router is second hand and certainly outside any support contract that may have existed when originally purchased.
Try to increase the timout and see if that helps you.
ip nat translation tcp-timeout 600
ip nat translation udp-timeout 600
Please rate the helpfull posts.
When you are experiencing this issue, are you able to validate that its even making it to your router from the internet? Also looking at the interface facing the internet, are you seeing any incrementing errors on the interface? Another thought would be to try a different DNS source and see if your able to replicate the issue, if you do its most likely the Cisco 1811.
I can perform transfers from other name servers fine at the moment. However, the problem only appeared for the name server in question after a few days; if I restart the router I am sure it would work again immediately.
Also, I think therefore there is no question that the packets are all making it to the router. I have no hub between the router and my cable modem with which to prove this at the moment, but the Cisco debug information does show the packets arriving.
Interestingly, if I change the MTU on the nameserver to 1400 temporarily, then I am able to do transfers again; all packets are successfully delivered through the router. If i increase MTU to 1500 again it stops working again.
I put a switch between my cable modem and Cisco 1811 router, and forwarded the Tx on both ports to a protocol analyser.
The packets are definitely going to the router. On this occasion, it is the second packet that the 1811 is losing/ignoring.
1. Nameserver sends packet with Seq 1, Len 1460 (1514 bytes total).
2. Nameserver sends packet with Seq 1461, Len 1460 (1514 bytes total).
3. Nameserver sends packet with Seq 2921, Len 382 (426 bytes total).
4. Client sends Ack 1461.
5. Nameserver resends packet 2 repeatedly until timing out.
I think that's proof that it's either something bad in my configuration, or the Cisco router is at fault (a bug in the version of IOS, or something more sinister like failing memory or components).
I would love it to be a problem with my configuration, but I can't think what would cause such odd behaviour. Can anyone imagine?
If I conclude that it's likely a router problem, can anyone please tell me the best way to start a support conversation with Cisco?