CSS 11501 service down - keepalive problem?

Unanswered Question
Jun 19th, 2008

Our 11501, running 7.20 Build 206, has one service that's showing state of down. (other services are alive). I ran a packet trace on the service's physical server and I'm not seeing any http activity that I thought I'd see via the keepalive.

I can ping the server from the CSS command line so I know they can communicate (and the packet trace shows the icmp activity).

I'm new to the CSS so not sure how to get this service back in ALIVE status. I've tried changing the keepalive to icmp but still no packets are seen between the css and the server.

Thanks

Tom Jones

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Diego Vargas Thu, 06/19/2008 - 14:47

Hi,

First of all I need to say that you have a very old code. 7.20 is old train code and is vulnerable to many issues, including a few related to showing the service down when the keepalive actually works. This might not be the case but is important to keep that in mind.

It is very weird that a ping works but changing the keepalive to icmp wont bring the service alive.

Can you provide a running config and show me the service?

Is the CSS directly connected to the server? If not what device is in between?

Run this commands:

show service summary

show keepalive summary

tfjonesjr Fri, 06/20/2008 - 03:56

Update:

I can go into debug and run icp probe service - runs just fine.

Debug icp runs fine for the service in question.

I created two named keepalives going against two other servers, these stay in the initializing mode.

I spoke to a TAC engineer yesterday, she recommended software upgrade due to bugs

regarding keepalives.

After reading over the bugs, it may be due

to several possible scenarios that could apply to our situation.

We will be rebooting at least the backup css this weekend and will see what happens. (service is down on both css'es, primary and backup). Note that I had suspended the service on both due to timeouts we were having in applications, this is not a failure of the server since I can connect to the physical server without problems.

BTW - I also ran packet traces against a working service and never saw any evidence of keepalive activity. I did see activity when running a icp however. Definitely a problem with keepalives.

I'll update after we reboot.

Here's the service summary:

===========================

CSS11501# show service summary

Service Name State Conn Weight Avg State

Load Transitions

qa-woeasaus1 Down 0 1 255 3997

qa-woeaus2 Alive 0 1 2 20

woeasaus1 Suspended 0 1 255 11

woeasaus2 Suspended 0 1 255 113

woeaus1 Alive 0 1 2 20

woeaus2 Alive 0 1 2 20

woecsaus1 Alive 0 1 2 8390

woecsaus2 Alive 0 1 2 6508

woecsaus3 Suspended 0 1 255 1

Janice_PROD Suspended 0 1 255 1

CSS11501# show keepalive-summary

Keepalives:

AUTO_nexthop00001 State: Alive 169.137.110.51

AUTO_qa-woeasaus1 State: Down 169.137.110.199

AUTO_qa-woeaus2 State: Alive 169.137.110.198

AUTO_woeasaus1 State: Suspended 169.137.110.199

AUTO_woeasaus2 State: Suspended 169.137.110.210

AUTO_woeaus1 State: Alive 169.137.110.197

AUTO_woeaus2 State: Alive 169.137.110.198

AUTO_woecsaus1 State: Alive 169.137.110.209

AUTO_woecsaus2 State: Alive 169.137.110.210

AUTO_woecsaus3 State: Suspended 169.137.110.199

AUTO_Janice_PROD State: Suspended 169.137.120.225

tfjonesjr Fri, 06/20/2008 - 15:40

We rebooted both CSS boxes this afternoon and all servers are now showing status of Alive.

Packet traces on the previously "Down" server are showing keepalive activity now

We'll schedule a software update in the near future to address the keepalive bugs.

Actions

This Discussion