RME 4.x: Config Archive issues

Unanswered Question
Jun 29th, 2007

Just noticed a never-before-seen error "CM0012: Unable to create new version on archive $1 Action: Check if disk space is available and directory has required permissions" reported for a few devices' VLAN collections. There're plenty of disk space. Permissions look fine in shadow:

/var/adm/CSCOpx/files/rme/dcma/shadow/Routers/VLAN

drwxrwx--- 2 casuser casusers 96 Jun 29 02:07 VLAN

Directories in /var/adm/CSCOpx/files/rme/dcma/devfiles all have the same permisions:

drwxr-x--- 4 casuser casusers

Looked at the permissions on the VLAN directory of one of the devices reported, it's ok too:

drwxr-x--- 3 casuser casusers 96 Apr 26 15:30 VLAN

A second issue is RME seems to have problems of refused sessions against certain devices, mostly when fetching running-configs, and in one rarely instance when getting VLAN data, but *never* with startup-configs:

1. rtr2 PRIMARY STARTUP Jun 29 2007 02:07:35 Successful

2. rtr2 PRIMARY RUNNING Jun 29 2007 10:18:45 Failed to detect SSH version running on the device. TELNET: Failed to establish TELNET connection to xx.xxx.xx.xxx - Cause: connect timed out. PRIMARY-RUNNING config Fetch Operation failed for TFTP. Failed to fetch config using SCP.connect timed out Increase Telnet Timeout in RME Device Attributes and try again.

3. rtr2 VLAN RUNNING Jun 29 2007 10:19:27 Successful

43. rtr1 PRIMARY STARTUP Jun 29 2007 03:22:36 Successful

44. rtr1 VLAN RUNNING Jun 29 2007 03:23:02 SSH: Failed to establish SSH connection to xx.xx.xx.xx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xx.xx.xx.xx - Cause: Connection refused. VLAN Config fetch is not supported using TFTP.VLAN Config fetch is not supported using SCP.

45. rtr1 PRIMARY RUNNING Jun 29 2007 03:23:01 Successful

Are startup-configs retrieved using a different mechanism? Is RME not terminating the startup-config retrieval session before going for the running-config, given the limited number of vty lines configured on each device?

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (4 ratings)
Loading.
Joe Clarke Fri, 06/29/2007 - 08:26

The first problem is related to CSCsh39475. A patch for this is available on Cisco.com under the RME Software Center section.

This connection refused problem could be an issue with you running out of available VTY lines on these devices. All sessions should be properly terminated in LMS 2.6. You will need to look at a show users from the console to see who is occupying these lines.

Joe Clarke Fri, 06/29/2007 - 08:28

Actually, there is one of my bugs that was still not fixed in RME 4.0.5. If you are using SCP, those sessions can remain open for a while after the config fetch is done. A patch for CSCsg48261 is available by calling the TAC. This is fixed in LMS 3.0.

yjdabear Fri, 06/29/2007 - 08:40

Wait, but I applied rme4.0.5-sol-CSCsh394751.0.tar as a preventive measure weeks ago. I do admit I didn't bother further with the two scripts since 1) I encountered an identical error trying to execute either of them; 2) they seemed to deal with existing symptoms that I wasn't seeing:

RemoveMismatchPrimaryConfigurations_Script.tar

RemoveMismatchVLANConfigurations_Script.tar

I will post up the error text later.

Joe Clarke Fri, 06/29/2007 - 08:56

Try running the scripts now, and see if that takes care of the problem with the affected devices.

yjdabear Mon, 07/02/2007 - 07:16

Do you think flaky TACACS could be causing the following?

Failed devices under Config Archive

1. catos-switch PRIMARY RUNNING Jul 02 2007 02:44:26 CM0002: Could not archive config Cause: Device may not be reachable, may be in suspended state or credentials may be incorrect. Action: Verify that device is managed, credentials are correct and file system has correct permissions. Increase timeout value, if required.

40. rtr2ios PRIMARY STARTUP Jul 02 2007 02:02:46 SSH: Failed to establish SSH connection to xxx.xxx.xxx.xxx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xxx.xxx.xxx.xxx - Cause: Connection refused. PRIMARY-STARTUP config Fetch Operation failed for TFTP. Failed to fetch config using SCP.Connection refused Verify SCP is enabled or not.

41. rtr2ios PRIMARY RUNNING Jul 02 2007 02:02:48 SSH: Failed to establish SSH connection to xxx.xxx.xxx.xxx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xxx.xxx.xxx.xxx - Cause: Connection refused. PRIMARY-RUNNING config Fetch Operation failed for TFTP. Failed to fetch config using SCP.Connection refused Verify SCP is enabled or not.

42. rtr2ios VLAN RUNNING Jul 02 2007 02:02:49 SSH: Failed to establish SSH connection to xxx.xxx.xxx.xxx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xxx.xxx.xxx.xxx - Cause: Connection refused. VLAN Config fetch is not supported using TFTP.VLAN Config fetch is not supported using SCP.

52. rtr3ios PRIMARY STARTUP Jul 02 2007 02:07:13 SSH: Failed to establish SSH connection to xxx.xxx.xxx.xxx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xxx.xxx.xxx.xxx - Cause: Connection refused. PRIMARY-STARTUP config Fetch Operation failed for TFTP. Failed to fetch config using SCP.(truncated MOTD banner)

53. rtr3ios PRIMARY RUNNING Jul 02 2007 02:07:19 SSH: Failed to establish SSH connection to xxx.xxx.xxx.xxx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xxx.xxx.xxx.xxx - Cause: Connection refused. PRIMARY-RUNNING config Fetch Operation failed for TFTP. Failed to fetch config using SCP.SSH2 Connection terminated Verify SCP is enabled or not.

54. rtr3ios VLAN RUNNING Jul 02 2007 02:07:20 SSH: Failed to establish SSH connection to xxx.xxx.xxx.xxx - Cause: Connection refused. TELNET: Failed to establish TELNET connection to xxx.xxx.xxx.xxx - Cause: Connection refused. VLAN Config fetch is not supported using TFTP.VLAN Config fetch is not supported using SCP.

Joe Clarke Mon, 07/02/2007 - 09:15

I would need to see a full debug to know exactly why these fetches are failing. What protocol should be used to fetch these configs?

yjdabear Mon, 07/02/2007 - 09:49

Every thing is configured to try the following protocols in order: ssh/telnet/tftp/scp. Would that be the debug of ArchiveMgmt Archive Client but not the Archive Server that's needed?

Joe Clarke Mon, 07/02/2007 - 10:03

EXACTLY what protocol should succeed for these devices? The only log of interest here will be dcmaservice.log with ArchiveMgmt Service debugging enabled.

yjdabear Mon, 07/02/2007 - 10:10

All four should, but that's not the reality due to a variety of outstanding issues.

Joe Clarke Mon, 07/02/2007 - 10:15

No, one and only one protocol should succeed for each failing devices. Since SSH is first, should that be the one that should work for these? Since DCMA is reporting that the telnet connection is refused, I'm guessing telnet will never work for these devices. This is important as it will help focus the troubleshooting.

yjdabear Mon, 07/02/2007 - 10:28

I know that. But SSH was refused first, then telnet was tried and refused too, and so on, when ALL four should work when tried (and if supported by ArchiveMgmt). I've seen some of these protocols used successfully (see my first post in this thread) one day, failed on the next, then failed in some other combination on the third day. That's the question mark there. CSCsg48261 could be one explanation, but I hadn't seen anything like this before, until ACS started acting flaky probably starting as early as a month ago.

Joe Clarke Mon, 07/02/2007 - 10:32

Okay, if SSH should be the primary protocol, and it's being refused, then it's either no longer configured, or the VTY lines have been exhausted. Have you confirmed that the problem is the VTY exhaustion (i.e. using show users)? If so, is the CiscoWorks server the offending IP, and what port is being used?

yjdabear Mon, 07/02/2007 - 11:42

For the two sample devices I posted about above, I did just find out the vty lines were full (my interactive session had the last spot). The other users apparently have stayed on for months, which I find strange, because the devices do have "exec-timeout ## 0" after "line vty 0 #".

Joe Clarke Mon, 07/02/2007 - 11:57

Okay, good to know there's not another problem with LMS not closing its sockets.

We have seen some issues with SSHv2 in IOS in general that could be causing some sockets to linger if they are not properly closed. The workaround is to manually clear the lines.

yjdabear Mon, 07/02/2007 - 12:06

Is SSH v1.99 implicated as well? That seems to be what I run into more often, at least according to "show ip ssh".

Could CSCsg48261 still be at play here though? Since there's only one spot left for LMS or anyone else that comes along?

Joe Clarke Mon, 07/02/2007 - 12:24

SSH 1.99 means the server speaks both SSHv1 and SSHv2, the client decides which version to use.

If CiscoWorks isn't occupying the VTY lines, then CSCsg48261 is not at play.

yjdabear Thu, 07/05/2007 - 05:56

I attached a debug dcmaservice.log to TAC case 606284225. This is concerning the failed CatOS Config Archives:

1. catos-switch PRIMARY RUNNING Jul 02 2007 02:44:26 CM0002: Could not archive config Cause: Device may not be reachable, may be in suspended state or credentials may be incorrect. Action: Verify that device is managed, credentials are correct and file system has correct permissions. Increase timeout value, if required.

Does the following snippet from the dcmaservice.log mean this is a sign of CSCsh39475 too?

[ Tue Jul 03 16:58:23 EDT 2007 ],ERROR,[Thread-2579],com.cisco.nm.rmeng.dcma.co

nfigmanager.DeviceArchiveManager,addNewConfigFileVersion,1004,CM0012: Unable to

create new version on archive $1 Action: Check if disk space is available and di

rectory has required permissions

[ Tue Jul 03 16:58:23 EDT 2007 ],ERROR,[Thread-2579],com.cisco.nm.rmeng.dcma.co

nfigmanager.DeviceArchiveManager,archiveNewVersionIfNeeded,1226,CM0002: Could no

t archive config Cause: Device may not be reachable, may be in suspended state o

r credentials may be incorrect. Action: Verify that device is managed, credentia

ls are correct and file system has correct permissions. Increase timeout value,

if required.CM0012: Unable to create new version on archive $1 Action: Check if

disk space is available and directory has required permissions

at com.cisco.nm.rmeng.dcma.configmanager.DeviceArchiveManager.addNewConf

igFileVersion(DeviceArchiveManager.java:1006)

at com.cisco.nm.rmeng.dcma.configmanager.DeviceArchiveManager.archiveNew

VersionIfNeeded(DeviceArchiveManager.java:1178)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.updateArchiveForD

evice(ConfigManager.java:666)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.performCollection

(ConfigManager.java:1529)

at com.cisco.nm.rmeng.dcma.configmanager.CfgUpdateThread.run(CfgUpdateTh

read.java:29)

[ Tue Jul 03 16:58:23 EDT 2007 ],ERROR,[Thread-2579],com.cisco.nm.rmeng.dcma.co

nfigmanager.ConfigManager,updateArchiveForDevice,710,Error archiving config for

catos6513sw1

[ Tue Jul 03 16:58:23 EDT 2007 ],DEBUG,[Thread-2579],com.cisco.nm.rmeng.dcma.co

nfigmanager.ConfigManager,updateArchiveForDevice,711,Exception..CM0002: Could no

t archive config Cause: Device may not be reachable, may be in suspended state o

r credentials may be incorrect. Action: Verify that device is managed, credentia

ls are correct and file system has correct permissions. Increase timeout value,

if required.

at com.cisco.nm.rmeng.dcma.configmanager.DeviceArchiveManager.archiveNew

VersionIfNeeded(DeviceArchiveManager.java:1228)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.updateArchiveForD

evice(ConfigManager.java:666)

at com.cisco.nm.rmeng.dcma.configmanager.ConfigManager.performCollection

(ConfigManager.java:1529)

at com.cisco.nm.rmeng.dcma.configmanager.CfgUpdateThread.run(CfgUpdateTh

read.java:29)

[ Tue Jul 03 16:58:23 EDT 2007 ],DEBUG,[Thread-2579],com.cisco.nm.rmeng.invento

ry.InvAPIs,isValidDeviceID,3450,ResourceBundle set for the logger

Actions

This Discussion