RME 4.0.4: Can't purge config archive

Unanswered Question
Apr 2nd, 2007

Hi,

Customer has LMS 2.5.1 SP4 on Solaris 9 on 2 server running. They work with master/slave for dcr. Now the master server can't purge the config archive anymore. The job failed 16h later, without any details.

On the slave server purge achrive job is running properly.

The directory /var is nearly full. Ciscoworks in not able to run all daly jobs properly. A lot of them fail. A solaris engineer find out, that ciscoworks create a lot if small inodes on the solaris server. It looks that rme is not able to delete the old config files.

Did have somebody else same experience?


Thanks


HR

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4 (1 ratings)
Loading.
Joe Clarke Mon, 04/02/2007 - 08:10

Config Purge can take a long time. Sixteen hours is not uncommon for large networks, but it should make some progress. The best way to troubleshoot this problem is to enable ConfigJob debugging under RME > Admin > System Preferences > Loglevel Settings, then run a new purge job. Under /var/adm/CSCOpx/files/rme/jobs/ArchivePurge, there will be a directory that corresponds to the purge job ID. In there, there will be number directories starting from 1. Those directories represent the instance of the job. Inside will be a log file. Scanning that log file should reveal why the configs are not being successfully purged.

anywebsbb Fri, 04/06/2007 - 12:17

Hi Joe


Thans for your investigation.

I did like you said. I send the log to the tac.

Case is SR605610155, when you are interested.

I would appreciate.


Kind regards


Hans

anywebsbb Sat, 04/07/2007 - 05:49

Hi Joe


I put the file on the ftp-sj.cisco.com in the directory incoming. File name is log_SR605610155_2.tar


Regards


Hans

anywebsbb Mon, 04/09/2007 - 01:54

Hi Joe


I put a new file on the server ftp-sj.cisco.com.

File is under /incoming and name is archivepurgejob_SR605610155.tar.

I check this file before. It is an tared directory. I was able to see the content with the unix command strings.


Regards


Hans

Joe Clarke Mon, 04/09/2007 - 20:17

Sorry, this archive is also corrupt. I cannot simply use strings on it to get what I need. I need to be able to properly untar it. When you upload this file to the FTP server, make sure you are using a BINARY transfer. You might also try compressing the tar file with zip or bzip2 as well.

anywebsbb Tue, 04/10/2007 - 08:40

Sorry for the circumstances.

I put a file on the server ftp-sj.cisco.com.

It is in the directory /incoming and name is

archivepurgeSR605610155.tar.gz.

I compressed the file with gzip and transfered it binary.


Hope it works.


Regards


Hans


Joe Clarke Tue, 04/10/2007 - 19:48

I was able to extract this file, but unfortunately, the error messages do not give me enough of a clue to know exactly what is wrong. At first glance, it seems your RME database is out of sync with the file system in terms of config archive. Since this is Solaris, a two-minute long truss of the running job may provide more clues. Once the job is running, use /usr/ucb/ps -auwwx and grep for the job ID. Then, when you have PID, run:


truss -a -f -vall -rall -wall -o /tmp/purge.truss -p


Kill the truss after about two minutes, then compress and post the resulting purge.truss.

anywebsbb Fri, 04/27/2007 - 09:53

Hi

At the meantime tac engineer told me to clear the devices in the dcr an reimport them again. After that an reboot, purge job is alright again. Anyway i put the file on the server again. ftp-sj.cisco.com/incoming/purge.truss.gz. Maybe you can see anything.

Thank you very match.


Best regards


Hans

Joe Clarke Fri, 04/27/2007 - 09:57

If you've removed and re-add the devices, anything obtained from the truss will not be relevant. Hopefully with regular purging from the start, you won't see a problem again.

Actions

This Discussion