CSS messages Service Down/Alive

vdidier · ‎04-17-2003

Hello,

we use CSS 11503, and for some service we use a http keepalive with value (10 3 5). We have sometimes messages DOWN/ALIVE with just one second difference :

APR 17 23:29:17 1/1 574 NETMAN-2: Enterprise:Service Transition:Test01 -> down

APR 17 23:29:18 1/1 575 NETMAN-5: Enterprise:Service Transition:Test01 -> alive

Why, i think that with the value we use for this keepalive service, the mininum difference from status down and alive shoud be of 5 seconds,

Thank a lot to explain us this value of 1 second between DOWN/ALIVE status ?

Didier

vkasacavage · ‎04-18-2003

Hi, it sounds like we are having the same problem.....

The service goes down for 1 second then comes back active...we have an open TAC case on this issue, the services to not acutally go down, here is their current thinking

Usually we see this type of log entry when the CS-800 is receiving a high volume of extraneous traffic. Because

we act as both a router and a bridge, we must examine each packet. This particular queue handles all non-specific

and non-IP traffic, including Spanning Tree BPDU, non-IP bridgeable traffic, ICMP, ARP, UDP fragments, and

packets with expired TTL. Under situations where the CS-800 receives a high number of these packets, such as

during a DOS attack or where other network anomalies exist, there may be occasional drops in this queue. This

should not have any impact on user TCP traffic, as TCP is sent to a different queue.

If this error appears in the log occasionally, then check the "show dos" commands to make sure the site is not

under attack. Also, check the network topology to make sure the routing is solid. If the log is filling rapidly

with these errors, then a packet capture may be helpful in isolating the cause

bhose · ‎04-23-2003

Hi Dider,

Sounds like you are getting keepalive timeouts due to network load. I think you are seeing the failure of 3 keepalives, 1 send every 10 seconds and following the 5 second retry period (total of 35 sec) the service is declared down. If a response to one of the keepalives is received 1 second after the service is declared down then the service will be reported as being up. This could be the reason that you are seeing a service down then, 1 second later service up.

As I’m sure you are aware, but having a service go down through slow keepalives is not desirable. I would suggest that if you see this regularly your best approach would be to increase either the retryperiod of the maxfailure period

Hope this helps !!