Re: tcp segment lost between App. & DB Servers.

goutam_04 · ‎02-20-2010

Hi,

We have been informed by the server team, that one server is rebooted due to lost of Heartbit. And their (Server Team) belive its because of Network problem.

My design is like this.

one CoreSwitch-6509 and 9 Access switches(3750) connected through 10G Fibre connectivity.

Context is created in 6509-FWSM Application and DB.

this problem we have been informed last few days....everytime their server got automatically rebooted due to heartbit lost. Please find the logs in Server.

153 6.006320 10.128.6.21 10.128.5.35 TCP [TCP Previous segment lost] sdsc-lm > 58953 [PSH, ACK] Seq=1573 Ack=1106 Win=32768 Len=22

Note: 10.128.6.21 = DB Server IP

10.128.5.35 = Application Server IP

Please suggest what should I do to fix this problem.

Reza Sharifi · ‎02-20-2010

goutam

Can you post sh log from the 6500 switch. This could be server application issue and nothing to do with the switches.

HTH

Reza

goutam_04 · ‎02-20-2010

Hi,

Please find the log of Core Switch 6509-1 & Core Switch 6509-2. I think there is some problem in DB or App but I could not able to show them whats the problem and not able to make them understand that this not the Network Problem.

Please check and help me.

goutam_04 · ‎02-21-2010

Hi,

Did you check the log? anyone could help on this?

Jon Marshall · ‎02-21-2010

goutam_04 wrote:

Hi,

We have been informed by the server team, that one server is rebooted due to lost of Heartbit. And their (Server Team) belive its because of Network problem.

My design is like this.

one CoreSwitch-6509 and 9 Access switches(3750) connected through 10G Fibre connectivity.

Context is created in 6509-FWSM Application and DB.

this problem we have been informed last few days....everytime their server got automatically rebooted due to heartbit lost. Please find the logs in Server.

153 6.006320 10.128.6.21 10.128.5.35 TCP [TCP Previous segment lost] sdsc-lm > 58953 [PSH, ACK] Seq=1573 Ack=1106 Win=32768 Len=22

Note: 10.128.6.21 = DB Server IP

10.128.5.35 = Application Server IP

Please suggest what should I do to fix this problem.

So the db server is behind the FWSM and the apps server has to go through the FWSM to get to the db server ?

If so i have seen an issue with this setup. It was to do with the timeout for TCP connections on the FWSM. The FWSM would timeout the TCP connection between the apps server and db server but they were not aware of it so it stopped working. The solution was to increase the TCP timeout for that connection.

Now FWSM v2.x code you could only increase the TCP timeout globally ie. you had to do it for all connections which was not ideal. But v3.x code onwards you can increase the timeout for specific TCP connections, see this link for an example -

FWSM tcp connection timeout

This is not necessarily the issue you are having but it might be worth a try.

Jon

goutam_04 · ‎02-21-2010

Hi,

We have seen this type of problem...TCP Connection Time Out Session.

So already we have increased the time line to 7 days.... still we are facing this type of problem. Is there any other solution to solve this.

ohassairi · ‎02-21-2010

you need to sniff the traffic and capture it in both sides: appl server and DB server.

then you need to analyze the captured packets and see if there r really some lost segments.

firewall log is also very usefull in these case. can you see any msg concerning this problem in fw logs?

Ganesh Hariharan · ‎02-21-2010

Hi,

We have been informed by the server team, that one server is rebooted due to lost of Heartbit. And their (Server Team) belive its because of Network problem.

My design is like this.

one CoreSwitch-6509 and 9 Access switches(3750) connected through 10G Fibre connectivity.

Context is created in 6509-FWSM Application and DB.

this problem we have been informed last few days....everytime their server got automatically rebooted due to heartbit lost. Please find the logs in Server.

153 6.006320 10.128.6.21 10.128.5.35 TCP [TCP Previous segment lost] sdsc-lm > 58953 [PSH, ACK] Seq=1573 Ack=1106 Win=32768 Len=22

Note: 10.128.6.21 = DB Server IP

10.128.5.35 = Application Server IP

Please suggest what should I do to fix this problem.

Hi,

Gautam how fast is the interval of disconection between App server and DB server,You have only firewall between the two server which are talking on hearbeat messages,Is there any specific port in whihc hearbeat communicates and the logs in the server is push ack messages which means already a TCP communication is establishes and data is getting pushed in the existing connection.

The best way to trouble shoot check what are the devices are there between the server and check the timeout for the port in which hearbeat messages are exchanged between the servers and if possible check out the TCP buffers in both the servers is there any TCP related issue at both the server end and finally capture a sniffer trace between the server on hearbeat port and then check what is the behaivor when it get disconnected.

Hope to help !!

Ganesh.H