cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
233
Views
0
Helpful
1
Replies

CAVEAT CSCeb77354 for signature release IDSK9-sp-3.1-4-S50

pgomes
Level 1
Level 1

CAVEAT for signature release IDSK9-sp-3.1-4-S50

- CSCeb77354: Daemons not running after approx 20 hours

Symptom: Some daemons, e.g. filexferd, loggerd, sapd, packetd, may not be

running on a 3.1(4) sensor.

Condition:

The 3.1(4) sensor is in an oversubscribed condition for an extended period of time. During this oversubscribed environment, packetd uses more and more resources (CPU and RAM) until essentially allavailable resources are being utilized. As a result, postofficed may be unable (due to all available resources being used by packetd) to communicate with other daemons. A watchdog process attempts to restart those daemons. But, the inability to communicate (due to all available resources being used by packetd with those daemons prevents them from being restarted.

Workaround:

Reduce the amount of traffic being directed to the sensor. execute nrstop followed by nrstart.

***********************************************

What would you describle as an oversubscribed environment?

At what point has this bug been reproduced (events/sec) ?

How problematic it this?

1 Reply 1

wardwalk
Cisco Employee
Cisco Employee

Hi Peter,

As far as I'm aware, this symptom has only been observed during development testing of the 3.1(4)S50 service pack on the 4250-TX platform. This symptom was observed on a 4250-TX after approximately 20 consecutive hours of monitoring steady 300 Mbps and 3000 connections per second of "background", i.e. contained no attacks, web traffic.

The symptom observed was that not all the daemons were reported after executing the nrstatus command on the sensor appliance. Additionally, the nrvers command reported problems communicating with some daemons. See below for an example of this symptom:

***** nrstatus and nrvers output when daemons running:

netrangr@sensor:/usr/nr

>nrstatus

netrangr 3140 1 0 15:44:10 ttyd0 0:00 /usr/nr/bin/nr.loggerd

netrangr 3164 1 0 15:44:11 ttyd0 0:12 /usr/nr/bin/nr.packetd

netrangr 3156 1 0 15:44:11 ttyd0 0:00 /usr/nr/bin/nr.fileXferd

netrangr 3148 1 0 15:44:11 ttyd0 0:00 /usr/nr/bin/nr.sapd

netrangr 3131 1 0 15:44:10 ttyd0 0:00 /usr/nr/bin/nr.postofficed

netrangr@sensor:/usr/nr

>nrvers

Application Versions for sensor.cisco

The Version of the Sensor is: 3.1(4)S50

postoffice v220 (Release) 01/12/14-20:01

logger v220 (Release) 01/12/14-19:59

sap v220 (Release) 01/12/14-20:01

fileXfer v175 (Release) 01/07/11-21:48

sensor v322 (Release) 03/07/14-21:56

netrangr@sensor:/usr/nr

***** nrstatus and nrvers output when some daemons (filexferd, loggerd, sapd) NOT running:

netrangr@sensor:/usr/nr

>nrstatus

netrangr 3164 1 0 15:44:11 ttyd0 0:12 /usr/nr/bin/nr.packetd

netrangr 3131 1 0 15:44:10 ttyd0 0:00 /usr/nr/bin/nr.postofficed

netrangr@sensor:/usr/nr

>nrvers

Application Versions for sensor.cisco

The Version of the Sensor is: 3.1(4)S50

postoffice v220 (Release) 01/12/14-20:01

Error timeout waiting for

Error timeout waiting for

Error timeout waiting for

sensor v322 (Release) 03/07/14-21:56

netrangr@sensor:/usr/nr

If you're seeing this symptom and the rate of monitored traffic cannot be reduced, you can disable the daemon restart, i.e. watchdog, process. This will prevent the watchdog process from attempting to stop/re-start daemons that aren't responding to the watchdog queries.

To disable the watchdog process, you'll need to set the number of process restarts to zero. To do this, edit the file /usr/nr/etc/postofficed.conf on your sensor. Change the value for the WatchDogNumProcessRestarts from 3 to 0. For example, change as follows:

From:

WatchDogNumProcessRestarts 3

To:

WatchDogNumProcessRestarts 0

Note: Setting the number of daemon restarts to zero is not recommended unless you're seeing the symptom, because it will prevent daemon restarts regardless of why the daemon stopped running. For example, if a daemon stops running for any reason, that daemon would not be restarted automatically; someone would need to manually do a restart on that sensor.