I've been batting myself over the head with this problem for the last 4 days and can't seem to figure heads or tails out of it.
We have a remote host that is trying to send LPR print jobs to a LPD queue. The queue is running on Windows 2003 Server with all the latest patches installed, and we are using the built-in Windows Unix Print Services for the queue.
A client connected on the same /24 subnet as the 2003 queue server can send queries to the server to check the status of the queue without any issues. We just run "lpq -S192.168.22.95 -PTESTQUEUE" and it returns the proper queue information. We can run this command over and over and over again and always receive the same results.
We can also run the same command from any other workstation located on different vlan/subnet on our network without any issues. All of our VLAN/subnets terminate at the same L3 core router.
But now a host located on the other side of our firewall in our DMZ trys the same command, we can get a good response 1 to 3 times in a row, then it just times out and we never get a response back from the queue server.
My firewall (PIX 535 running 7.2-4) has the correct firewall rule to allow lpd traffic through from the workstation to the queue server:
access-list DMZ line 1 permit tcp host 192.168.1.27 host 192.168.22.95 eq lpd
When I look at the firewall logs I can see that the connection is built and then torn down properly when I get a valid response:
Feb 12 2009 14:17:20: %PIX-6-302013: Built inbound TCP connection 901914030 for dmz:192.168.1.27/721 (192.168.1.27/721) to acad:192.168.22.95/515 (192.168.22.95/515)
Feb 12 2009 14:17:20: %PIX-6-302014: Teardown TCP connection 901914030 for dmz:192.168.1.27/721 to acad:192.168.22.95/515 duration 0:00:00 bytes 128 TCP FINs
And when the lpq query fails to return any info to the client I receive the following firewall log messages:
Feb 12 2009 14:17:25: %PIX-6-302013: Built inbound TCP connection 901917200 for dmz:192.168.1.27/721 (192.168.1.27/721) to acad:192.168.22.95/515 (192.168.22.95/515)
Feb 12 2009 14:17:55: %PIX-6-302014: Teardown TCP connection 901917200 for dmz:192.168.1.27/721 to acad:192.168.22.95/515 duration 0:00:30 bytes 0 SYN Timeout
I have no ACL's applied to any of my routing interfaces, firewall interfaces are the only places where we have ACL's applied.
The weird thing is that it will work for a couple of queries, then not work....then work for 1 query...stop for 2 or 3 queries....etc....It's very weird.
I've also taken Wireshark captures by mirroring the switch port where the server is connected to. When I receive the timeouts from the query, the server receives the initial SYN from the client, but the server never responds with a SYN/ACK. Then after a few additional queries, it will start responding with SYN/ACK's properly.
I've replicated this problem on multiple Windows 2003 servers.
We currently accomplish the same task by using Linux and OpenVMS servers and do not experience the same problem (workstations are on the same networks, and servers are on the same networks).
Any ideas??? This sounds like it is totally an issue with the operating system on the server, but I need a better explination of why it is doing what it is doing. I can provide more details on exactly what is going on if needed.