I am getting the following error in several devices in our network.
The 10.1.10.25 is the Ciscoworks server, I have reset the SNMP strings several times and cannot trace where the error is coming from.
Has anyone experienced this?
May 18 14:04:15.333: %SNMP-3-AUTHFAIL: Authentication failure for SNMP
If you're absolutely sure the community strings in LMS are correct, the most likely cause of this problem is Campus Manager and community string indexing. What type of device is this? I'm betting it's an IOS switch. If you gather a packet capture of the SNMP traffic triggering this trap, it may be possible to prevent it.
You are right j,
I am seeing this same message in several switches (6509).
What in Campus Manager and community string indexing causes this problem?
User Tracking uses community string indexing to get end hosts from each VLAN. ANI uses community string indexing to get STP information for each instance of spanning tree. A sniffer trace is needed to see which it is.
You don't. You have to manipulate options in ANIServer.properties (typically to disable polling of invalid VLANs). The root strings are most likely correct, but the index suffix is not valid. This of course is just a guess at this point. A sample trace of this would give a very clear picture of what is going on.
You might need to query TAC and ask why this isn't avialable to you?
Here's some of the correspondance history:
Subject: 604897427 CiscoWorks %SNMP-3-AUTHFAIL:
Sent: 15-JAN-2007 15:21:35*** Service Request LOG 2007-01-16 15:10:43.0 GMT, EDUAMOLI, Action Type: Email Out ***
Subject: 604897427 CiscoWorks %SNMP-3-AUTHFAIL:
Sent: 16-JAN-2007 15:10:43*** Service Request LOG 2007-01-18 13:42:23.0 GMT, EDUAMOLI, Action Type: Email Out ***
Subject: 604897427 CiscoWorks %SNMP-3-AUTHFAIL:
Sent: 18-JAN-2007 13:42:23*** Service Request LOG 2007-01-19 14:19:18.0 GMT, W-NAKAMA, Action Type: Web Update ***
Thank you to Eduardo and Luis for helping on this problem. I have not received
a failed snmp authentication, since we disabled Periodic Polling. Will this
work-around solution be posted for everyone to see. On NetPro community, other
people are reporting same problem.
Thank you again,
*** Service Request LOG 2007-01-19 15:21:02.0 GMT, xxxxx, Action Type: Phone Log ***
Hit VM*** Service Request LOG 2007-01-19 15:22:51.0 GMT, xxxxx, Action Type: Email Out ***
Subject: 604897427 CiscoWorks %SNMP-3-AUTHFAIL:
Sent: 19-JAN-2007 15:22:51*** Service Request LOG 2007-01-19 15:23:54.0 GMT, EDUAMOLI, Action Type: Problem Description ***
%SNMP-3-AUTHFAIL emails from CW*** Service Request LOG 2007-01-19 15:23:54.0 GMT, EDUAMOLI, Action Type: Resolution Summary ***
Disable Periodic Polling
Thanks again I appreciate it,
But where the heck do you disable the periodic polling?
I can't find it anywhere in Cisco works.
I have a polling job in RME but it is only scheduled every 24 hours.
The authentication failures I am seeing are every two minutes.
Disabling config archive periodic polling is not an ideal solution. The real solution will be to track down exactly what is causing the AUTHFAIL messages, and simply correct the bad polling. I looked at the sniffer traces in this SR, but they are only uni-directional. That is, I only see traffic bound to the CiscoWorks server.
I have also not seen a case where periodic polling causes AUTHFAILs unless the community string in DCR for the device is wrong. I would still like to see a good bi-drectional sniffer trace showing the problem, and I am sure I can isolate the true cause.
If I do a sniffer trace on the switch, which interface do I look at?
Would it be the interface defined in ACS?
Also, on the switch, how is it decided which interface is the one used for TACACS authentication if it is not defined in the config?
Since the AUTHFAILs are being triggered by CiscoWorks polling, do a sniffer trace on the CiscoWorks server, and filter on the IP address of one of the switches as it appears in DCR. So, yes, it would have to be the IP address defined in ACS.
If you are not specifying a source address for TACACS+, the switch will use the interface closest to the AAA server to source its packets.
I have tried to capture on several ip addresses for the two 6509 switches and cannot get the authenication failure trap captured.
I tried the addresses in ACS and several others that are configured.
I see in the logs that the failures occur while I am capturing, but did not capture them, I have a ton of other packets, but not what I am looking for.
What I did notice is that in Campus Manager the name of some of the devices is totally different than in the other modules.
For example the two 6509 swithes are discovered as 6509-RTR-01 and 02,
In Campus Manager, they show up as 9509-RTR1, RTR2.
Is there any way to find out for sure which interface CiscoWorks is using for the request?
If it's CiscoWorks that is responsible for the polling, then the IP address will be the one listed in DCR for the device. If the IP address field is empty in DCR, then the IP address will be whatever the hostname (in DCR) resolves to. If the hostname field is empty in DCR, then the IP address will be whatever the display name (in DCR) resolves to.
Of course, this will only capture the CiscoWorks traffic. The actual trap/syslog sent from the device may come from a different address. However, you should be able to determine that address from the log in which you see the AUTHFAIL message. You will need to filter on both addresses to see the whole picture.
Thanks for the reply,
First off, sorry, but what is DCR?
I can see in the switch logs that the authentication failure is definately coming from the Ciscoworks server address.
It is like one of the modules is trying to poll the device and is misconfigured.
DCR is the Device Credential Repository. Go to Common Services > Device and Credentials > Device Management. This is the master list of devices and credentials for all CiscoWorks applications. No CiscoWorks application will manage a device that is not listed here.
here is what I see and maybe you could give me your thoughts on this:
The devices (the switches) are in DCR with the same ip address and hostname as in the Switches group on the ACS. I have reset the credentials here.
It looks like all devices are showing up here with the proper Display Name and IP Address that is in ACS.
When I do a "Device List' report, they show up with the correct IP Address and Display name, but the hostname is incorrect.
When I do a "Devices not configured in ACS" the switches show up here, but with an IP Address which is a VLAN interface on the switch, but not configured anywhere that I see in CiscoWorks (maybe discovered?, I noticed the discovery settings are to use the IP address, not loopback interface, but this is not the lowest address on the switch).
The address that is in the "Not Configured in ACS" report is the one that CiscoWorks is using to SNMP poll the switch, I was able to see nothing but successful connections in the capture, even while capturing packets and seeing an authentication failure in the switch log while I was capturing. They did not happen at the same time, so the failure is coming from the Ciscoworks server, but not on that address.
One of the switches is being successfully polled for User Tracking information, the other is not.
I have gotten the second switch to have User Tracking info pulled before by changing Authenticated user in the Device Credentials, but it stopped for some reason.
First things first: your DCR is a mess. You have duplicate devices, but you can't see those duplicate devices because you're integrated with ACS, and those IP addresses aren't known to ACS. I think the best solution for you is to not use Discovery. All your devices are already known to ACS, and discovery is only confusing things. You don't need discovery to use RME, Campus Manager, etc.
I think what would be best at this point is to break the ACS integration (temporarily), then clean up all of the duplicate devices in DCR (basically remove all of those entries that are now showing up in the Devices not in ACS report). Once you have a good DCR, delete all of the scheduled Discoveries, and re-establish ACS integration (when doing this DO NOT check the box to register applications with ACS).
At this point, your DCR will be clean, and you will only be managing the devices by the correct IP/hostname with the correct credentials. If the AUTHFAIL problem persists at this point, capturing a sniffer trace should be easier given that there shouldn't be any "mystery" devices in the Devices not in ACS report.
I agree it is a mess, I don't think it has ever worked properly. Just from the information I have seen on your posts, it was not set up correctly to begin with.
But, I was able to find the interface that is producing the authentication error.
I found it by going to:
selecting the device and then report, there is an errors section, clicking errors shows the interface which is generating the error. The packet is actually a syslog packet and doesn't really tell me anything.
I wanted to start over with cleaning up the ACS devices first, then cleaning up the Ciscoworks stuff.
How do I break the integration?
How do I remove the "Not Showing" devices?
All I see is a report, i didn't where to remove them.
To break the integration, go to Common Services > Security > AAA Mode Setup, and set the Type back to Non-ACS. You will then need to restart dmgtd. When you do, you will see all of those devices that were showing up in the report in DCR. You will then be able to remove them.
I am pretty sure I found the source of the SNMP authentication errors:
I stopped the integration and was able to delete the items as you suggested.
I also deleted the scheduled discovery jobs, but I did not apply the settings, so the devices were discovered again.
I saw that one of the devices is discoverd in a subnet different than anything I have seen configured in CiscoWorks.
I see an SNMP packet going from the server to the device on that subnet, trying to get systemuptime with community name public (incorrect).
The source port is 39542 from the server. Do you know if this is any of the Ciscoworks modules using this port?
The SNMP client will use a random UDP port as a source. This is most likely Discovery, though. Like I said, in your case, you should consider stopping Discovery, or at least tightening down the discovery filters.
Note: Discovery does have a default SNMP community string it will use if it's made to discover a device that is not already in DCR, or specified in the discovery SNMP Settings. This string is public by default, and can be changed in NMSROOT/campus/etc/cwsi/DeviceDiscovery.properties. But if Discovery is using this string, you should probably consider changing your discovery filters, or adding more entries to your discovery SNMP Settings.
It looks like things are getting a little straightened out.
I have a clean list (0 items) when I do a "Items not in ACS".
I reorganized the ACS devices and removed everything from the DCR list, then re-entered the devices by the names that were in the ACE.
I tried to do discovery but it seems that there are problems when I do that, even when making the filter more restrictive.
After doing the discovery, I used your procedure again to break the integration and removed everything.
My question are:
1. It looks like everything that was entered into DCR manually is working fine, but I see the two 6509 switches showing up in Campus Manager as the lowest IP Adress of the device for the Device name and not the name entered in the DCR.
RME is using the same name as DCR.
2. Where do I set up a new job to archive configs? I can't find where to creat a new job?
Thanks for all of the help.
1. Where in Campus?
2. The system config collection job can be setup under RME > Admin > Config Mgmt > Collection Settings. Ad hoc jobs can be created under RME > Config Mgmt > Sync Archive.
1. When I go to Campus Manager Administration, on the Home tab, under System Status it shows the results of the Device Discovery, Data Collection and User Tracking Acquisition.
The device discovery is showing 41 devices from the old previous discovery (it is dated).
The data collection is showing 24 devices (that I manually entered), if I click on the "24 devices" it takes me to the device list, where the 6509 Device Name is not what is in DCR.
Also on the User Tracking, the end hosts number is correct now, but it shows the 6509 switches Device Name as the lowest interface IP Address, not the name entered in DCR.
Is it possibly remnants of the old discovery? If so, how do I get rid of those 41 devices listed under the "Device Discover"
Also I am showing 0 IP Phones on the "User Tracking Acquisition". The phones are showing up under "End Hosts", is this normal?
For the Archive management, I see a job created by someone else that looks like it has never been successful to copy configs. It is scheduled to run every night. Do I just delete this job and set the schedule (without entering a device list), to archive configs?
If you have DCR the way you like it, go ahead and reinitialize your ANI database:
NMSROOT/bin/perl NMSROOT/bin/dbRestoreOrig.pl dsn=ani dmprefix=ANI
The "41" number next to Discovery won't change, but that's purely cosmetic. At this point, the Data Collection and User Tracking data is gone. Then, perform a new Data Collection, and check the results. They should agree with what you have in DCR. Once Data Collection is complete, a new User Tracking acquisition will start automatically.
IP phones will always show up under End Hosts. A phone has to be an end host before it can be considered an IP phone. Then, if it's a Cisco IP phone, and the Call Manager to which it is registered has been properly Data Collected, then it will be added to the phone table in User Tracking.
Since the phones are only showing up as users, your CCM is most likely not properly Data Collected, and showing up as green with the proper icon on the Topology Map.
For your Config Archive question, just reschedule the periodic collection job using RME > Admin > Config Mgmt > Collection Settings. But in order to avoid a failure, verify all the credentials are correct in DCR for your devices. One failing device will mark the job as failed.
It looks like the Data Collection is correct, although I am not 100% sure of every port.
What exactly will the re-initializing of the ANI database do?
What are the chances of corruption when doing this?
and is the NMSROOT part of the command or the C:\ of the CiscoWorks Server?
Also, what about defragmentation of the hard drives, is this ok to do with the Cisco Databases?