CSS service Down even though responding to icmp probe

Unanswered Question
Feb 9th, 2010

Hi,

A server is responding to PINGs but when configured to be used on the CSS service, it will keep in "Down" state. I have tried to add additional services on the CSS for other valid destinations and all would not become Alive.

-----------------------------------

CSS01#

ping 10.10.101.98
Pinging 10.10.101.98 1 time(s)...
Working(-) 1/1
100% Success.

CSS01#

service test_service
ip address 10.10.101.98
keepalive type icmp
active

show keepalive AUTO_test_service
Name: AUTO_test_service  Index: 66  State: Down
Description: Auto generated for service test_service
Address: 10.10.101.98  Port: Any
Type:            ICMP
Frequency:        5
Max Failures:     3
Retry Frequency:  5
Dependent Services:
    test_service

-----------------------------------

Anything to check on the CSS that might indicate what is the issue? system-resource does not give an indication that memory or cpu are exhausted.

Regards

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Sean Merrow Tue, 02/09/2010 - 13:36

Hello,

I know there were some older bugs where keepalives would stay "down".  Can you upload the output of "script play showtech" from the CSS as an attachment?  If this is not possible, the config, hardware model and software running on the device should be enough.

Thanks,

Sean

I am currently having this issue myself - very annoying.  No new services will show "Alive" no matter what the keepalive configured is.

I am currently thinking a reload will "fix all" but I would rather not reload my production CSS - if there is a reason/explianation and a solution?!

CSS11503(debug)# show uptime ${output}

Uptime:

CSS5-SCM-2GE G0        :  1281 days 23:59:08

CSS5-IOM-2GE E0        :  1281 days 23:59:05

CSS5-SSL-K9 G0         :  1281 days 23:59:05

CSS503-SM-INT          :  1281 days 23:59:05


CSS11503(debug)# echo "show disk" ${output}
show disk
CSS11503(debug)# show disk ${output}
PCMCIA Slot: 0

          total # of clusters:  62544
            bytes per cluster:  16384
                free clusters:  57668
                 bad clusters:  0
                   free bytes:  944832512 (944 MB)
    max contiguous free bytes:  876724224 (876 MB)
                        files:  707
                      folders:  40
         total bytes in files:  71821754
                  lost chains:  0
   total bytes in lost chains:  0

CSS11503(debug)# echo "show running-config" ${output}
show running-config
CSS11503(debug)# show running-config ${output}
!Generated on 03/18/2010 14:24:02
!Active version: sg0810106

Sean Merrow Thu, 03/18/2010 - 08:12

Hello Andrew,

Have you taken a network capture to confirm that the CSS is indeed sending out the keepalive, and if so, is it not getting the required response?  If keepalive is working as it should, and the CSS is still reporting it as down, then it sounds like you're hitting a bug.  You'll also want to make sure you are not surpassing the keepalive limits of the CSS.

While you are very possibly correct in that a reload will resolve the issue, at least temporarily, if you do take that route, you might want to make it a little more worth-while by doing an upgrade.  There have been several fixes for keepalives being stuck down since the 8.10.1.06 release you're running (released 4 years ago).  Plus given the uptime on your CSS, if you're in a redundant configuration, then you are lucky to not have hit one of the 828 days of uptime bugs.

HTH,

Sean

Hello Sean,

Thanks for replying - I have indeed taken a capture, from the new server I tried to install today; and see no keepalives from the CSS at all.  I see the ICMP on the server from the CSS when I initiate a ping.  I see content being served from the server - when the HTTP is established directly (the CSS is also a layer2/layer3 device - when not being used as an LB)

I know the server is working/providing content as I can reach it/browse it, the CSS is not showing the service as Alive .  However 6 other services that are configured are all ok - apart from my new ones.

I have 5 other CSS's in my organisation - all same make/model same modules & software.  However the other 5 do have uptimes of lower than 600 days, this is the only one that has high uptime.  This device is in a redundant pair - it's mate has an uptime of 324 days (power issue).

To be honest upgrading has never crossed our minds - as they have been so reliable, however shceduling an upgrade window would not be easy!

Do you know of anyother way of solving this apart from upgrading/rebooting?

Thanks,

Andrew.

Sean Merrow Thu, 03/18/2010 - 08:34

Andrew,

Unfortunately, I am not aware of any way to recover from this without a reload.  Maybe bouncing (suspend/active) the service, or removing/readding the service from the config?  Maybe changing the keepalive type, then setting it back?

Wish I could add more.  :- (

Sean

Hi Sean,

Thanks - I was afraid you were going to say that I have tried suspend/active the services, adding/removing them - to no avail.

OK - reboot it is then, just one more thing, do you have the url for the "then you are lucky to not have hit one of the 828 days of uptime bugs"

I could use this in justification for upgrading all 6 units?

Thanks for the help.

Actions

This Discussion

Related Content