*** Conflicting device support info for DFM 3.2 ***

Answered Question
Aug 26th, 2010
User Badges:

Hellows...   ;-)


The helpful folks at TAC have been trying to troubleshoot one of my last and biggest pending items, which was the perceived inability of DFM to manage the devices on our network.  This was a rather puzzling issue, as the other LMS components (CS, CM, RME, etc.) had no apparent issues whatsoever doing everything I asked them to do.  After countless hours trying to troubleshoot DFM discovery errors ("questioned" with SNMP timeout despite the fact that all other LMS modules manage the same devices perfectly fine), an alert TAC engineer finally asked whether or not these devices were, in fact actually supported by DFM 3.2 - low and behold, a can of worms opened up!


The best current guess is rather confusing to me:  There is a Cisco document out there suggesting that NONE of our devices are among those supported by DFM, while I did find another Cisco document that somewhat contradicts that notion.  I’d like to think that this must be confusing (or at least very little known) to TAC as well, since nobody over there considered this a potential culprit for the first almost three weeks of troubleshooting around the globe during countless WebEx sessions.  We basically went through everything imaginable (process monitoring with full debugging, complete removal and new installation of DFM only, complete clean-up and re-initialization of all module databases – and in the process tearing down most of my configurations and settings –, to a midnight conference call with developers in India).

The end result appears to be that DFM functionality will not be available to me – please confirm.  What are the alternatives?  Any rhyme or reason to Cisco not supporting these device types?  Any plans to ever do so?


I run a variety of devices on my network, most of them being 3560G, 3560E and 6504E switches, pretty much bread-and-butter variety of basic Cisco devices.  Why on earth would there even be a question that these are or are not supported by all LMS modules?


Argument AGAINST support in DFM 3.2:

http://www.cisco.com/en/US/docs/net_mgmt/ciscoworks_lan_management_solution/3.2/device_support/table/lms32sdt.html#3.2table


Argument IN FAVOR of support in DFM 3.2:

http://www.cisco.com/en/US/docs/net_mgmt/ciscoworks_device_fault_manager/3.2/device_support/table/dfm3_2os.html

According to that list, our 6504E with IOS is fully supported by DFM 3.2 with LMS 3.2, and so are the 3560G and 2950 series switches, the 2500 series router.  However, the 3560E series switches are not listed as supported.

Are we seeing ghosts here or have other people had device support issues with DFM?

Thanks,

Matthias



Correct Answer by Joe Clarke about 6 years 9 months ago

There appears to be a problem with the DFM engines.  This is not an issue with device support.  If if your devices were unsupported, you would not be seeing the symptoms you are seeing.  You'll need to go back to TAC as I'm betting EMC will need to get involved to look into the DFM server operation.

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (1 ratings)
Loading.
Joe Clarke Thu, 08/26/2010 - 14:40
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

DFM 3.2 supports a number of 3560 switches including 3560E.  I would have to know the specific sysObjectID to confirm whehter or not support exists, though.  Typically, when we see devices in a Questioned state in DFM because of an SNMP timeout, I would think that DFM is using SNMPv3, and the engine ID has been duplicated across a number of devices.  For SNMPv1 and v2c, if other applications on the ssame server are working, then so should DFM.  Troubleshooting this is best done with a sniffer trace to confirm DFM is sending the requests and the devices are replying.

schm196 Thu, 08/26/2010 - 16:33
User Badges:

Hi Joseph -

Many thanks for your reply.  We do use SNMP version v2c; all other modules use the same default credentials and work fine with the same devices, and we performed all connectivity tests (nslookup, dmctl -s DFM invoke SM_System::SM-System nameToAddr/addrToName, as well as snmpwalk from server to several devices) without problems - it is just DFM that has issues.

Also, I'd love to take your word re/ support of 3560 and 3560E but looking at the Cisco documents I referenced you will find that that's potentially not the case.

Matthias

P.S.

In case you have access to TAC info:  SR#615028875

schm196 Thu, 08/26/2010 - 16:39
User Badges:

Oh, and here are some representative sysObjectID responses:



sysObjectID

RFC1213-MIB::sysObjectID.0 = OID: CISCO-PRODUCTS-MIB::catalyst3560G48PS


sysObjectID

RFC1213-MIB::sysObjectID.0 = OID: CISCO-PRODUCTS-MIB::catalyst3560E24TD


sysObjectID

RFC1213-MIB::sysObjectID.0 = OID: CISCO-PRODUCTS-MIB::ciscoWSC6504E

Joe Clarke Fri, 08/27/2010 - 08:10
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

All three of these device types are supported by DFM 3.2.  You may need to go to Common Services > Software Center > Device Update to download the latest DFM device support update, though.

schm196 Fri, 08/27/2010 - 08:33
User Badges:

I frequently run the device updates...  I double-checked and I have 887 listed as supported in DFM, among them all those in question.

Attachment: 
Joe Clarke Fri, 08/27/2010 - 08:47
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Yep, you should have support for all of them.  Again, lack of device support would not cause a device to go to Questioned.  Since you're using v2c, I would deploy a sniffer trace, and see if you can capture a rediscovery cycle where a device goes to Questioned.

schm196 Fri, 08/27/2010 - 08:51
User Badges:

I'll have to ask TAC with help on this, it's slightly beyond my skill/experience level.  I'll let you know about the results as soon as I have them...


Thanks again for taking the time to advise.

schm196 Fri, 08/27/2010 - 14:32
User Badges:

I ran a packet capture using the LMS Device Center tool for UDP ports 161 and 162.  Just after beginning the capture process, initially set for 5 minutes, I submitted all 59 devices for rediscovery in DFM.  All turned to status learning, then back to status questioned, at which time I ended the packet capture.  Attached is the output file.

Attachment: 
Joe Clarke Sat, 08/28/2010 - 17:20
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

This isn't a packet capture.  This is the Tomcat stdout.log.

schm196 Mon, 08/30/2010 - 13:04
User Badges:

Well, with all due respect, this is the Cisco LMS - Device Center - Tools - Packet Capture feature that I used.


Okay, so now I dumped Wireshark onto that server (mind you I am not a gearhead who's very familiar with it) and tried to capture/sniff what's going on there.  I simply start a live capture on the physical interface and set the filter to "SNMP present" - I do see a handful of incoming SNMP traps every once in a while when an interface on a switch goes up or down somewhere, but starting a rediscovery does not create any visible traffic with this filter.  What should I be specifically looking for as a filter setting?

Joe Clarke Mon, 08/30/2010 - 15:50
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

The built-in packet capture is fine.  When you stop your capture, the window that shows the list of packet captures should display the new capture file.  When you click on this, you'll get a file that ends with ".jet."  What you posted was a log file from the Tomcat servlet engine.  The .jet file will be a binary file that can be opened in something like Wireshark.


In Wireshark, you'll want to setup a capture filter of "udp port 161".  That can be done under Capture > Options.  When you do a rediscover in DFM, you should see a few packets (at least four) to/from the device being rediscovered.

Joe Clarke Wed, 09/01/2010 - 21:14
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

All of these packets are SNMP traps.  There are no polling packets here.  Did you have a filter enabled for udp/161?  What error do you see in DFM after the discovery fails?

schm196 Thu, 09/02/2010 - 08:40
User Badges:

As you can see in the attached screenshot, the filter was set to ports udp/161 and udp/162, so both polling and trap traffic should have been captured.  DFM always does the same during rediscovery attempts... Questioned->Learning->Questioned.


I verified the results using Wireshark - with filter "SNMP present" (presumably that means any udp/161 and udp/162 traffic) all I see during rediscovery are unrelated traps sent by switches on the network (interface up/down, etc.) but no outbound polling packets.

Attachment: 
Joe Clarke Thu, 09/02/2010 - 08:44
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

But what is the Questioned reason?  DFM will first attempt to ping the IP address of the device as it is in DCR.  If that fails, the device will be put into a questioned state.  Only after the ping succeeds does DFM attempt SNMP.  Since I'm not seeing any SNMP traffic, I'm thinking the ping may be failing.

schm196 Thu, 09/02/2010 - 08:51
User Badges:

That is what is so puzzling...  *all* devices return with the same reason - SNMP timeout.  However, we tested the SNMP walk several times and it appears to be fine.  Again, only DFM has this issue - everything else we use (CS, CM, RME, etc.) appear to be working just fine.

Joe Clarke Thu, 09/02/2010 - 22:39
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Expand your packet capture filter to capture all traffic to the IP address of one of the problem devices as it appears in DCR (i.e. filter all traffic to the management IP of the device).  Rediscover the device in DFM, then post the capture file.

schm196 Fri, 09/03/2010 - 09:01
User Badges:

Attached are the two captures, one with LMS and the other one with Wireshark, both taken during the same rediscovery attempt for all devices, both without filters.  Note that all network devices in DCR have 192.168.1.x IP addresses, while 10.10.2.9 is the LMS server address and 10.16.5.25 is the address of my desktop during the remote connection, in case you'd like to filter out some garbage.

Joe Clarke Sat, 09/04/2010 - 10:01
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

All of these 192.168.1.X devices moved to a Questioned state?

schm196 Sun, 09/05/2010 - 17:15
User Badges:

Yes, all of them return to "questioned" state - from the smallest wireless access point over switches and routers all the way to 6500-series devices.  All network devices are in the 192.168.1.0/24 subnet, and there are only about 60 of them.

Joe Clarke Sun, 09/05/2010 - 20:34
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Okay.  I have a feeling the DFM Servers may be in a bad state.  If you haven't done so already, install the consolidated patch for CSCtb87449 from http://tools.cisco.com/support/downloads/go/ImageList.x?relVer=3.2.0&mdfid=282640771&sftType=CiscoWorks+Device+Fault+Manager+Patches&optPlat=Windows&nodecount=2&edesignator=null&modelName=CiscoWorks+Device+Fault+Manager+3.2&treeMdfId=268439477&treeName=Network+Management&modifmdfid=&imname=&hybrid=Y&imst=N&lr=Y .  Then REBOOT the server.


When the server comes back, try to rediscover your devices.  If that fails, post the DFM.log and DFM1.log under NMSROOT/objects/smarts/local/logs.

schm196 Tue, 09/07/2010 - 10:18
User Badges:

Hi Joseph -


Thanks for your assistance thus far.  I applied the consolidated patch for CSCtb87449 successfully, restarted the LMS server, and performed another rediscovery.  Unfortunately, all devices still go from Questioned to Learning and back again to Questioned state.  As before, all devices still cite "SNMP timeout" as the reason.


Attached are the DFM logs.


Matthias

Attachment: 
Joe Clarke Tue, 09/07/2010 - 21:42
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Follow the instructions at https://supportforums.cisco.com/docs/DOC-8796 to reinitialize the DFM databases (dfmEpm, dfmInv, dfmFh, and delete the two rps files).  When LMS starts back up, add one device to DFM and verify it goes to a Known state.  If it does, sync the rest of your devices from DCR.

schm196 Wed, 09/08/2010 - 08:47
User Badges:

I clearly recall that TAC already went through the reinitialization of databases, first for DFM only, then for all LMS databases, and the subsequent attempts to add just a single device.  No success.

Martin Ermel Thu, 09/09/2010 - 07:43
User Badges:
  • Blue, 1500 points or more

You reported that you have succesfully tested the snmp RO access to the devices in question. So can you use the DFM built-in snmpwalk tool for this test and enable "debug snmp packets" on the device. With this I expect that you see if the packets makes their way to the device and if the sm_snmpwalk program is working (hopefully this is not only a cli program but also the code used internally):


Step 1 Go to NMSROOT/objects/smarts/bin

Step 2 Enter the following command for:

Snmp v1 and snmp v2 devices:

    For Solaris:    ./sm_snmpwalk --community= deviceIp

    For eg:         ./sm_snmpwalk --community=cisco 4.1.1.1

    For Windows:    sm_snmpwalk --community= deviceIp

    For eg:         sm_snmpwalk --community=cisco 4.1.1.1

The above command will generate three files,

    xxxxx.walk,

    xxxxx.mimic, and

    xxxxx.snap files

[where xxxxx is the device IP] in the same location, that is in NMSROOT/objects/smarts/bin.

Joe Clarke Sun, 09/12/2010 - 01:19
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

The devices are responding to SNMP, but DFM doesn't appear to be querying them during the Discovery process.  When you reinitialized the databases before, did you destroy the DFM rps files?

schm196 Mon, 09/13/2010 - 12:30
User Badges:

I am not 100% certain but I watched TAC perform these steps and I believe I recall them also addressing the need to delete these rps files... so my answer would be yes.

Correct Answer
Joe Clarke Mon, 09/13/2010 - 16:39
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

There appears to be a problem with the DFM engines.  This is not an issue with device support.  If if your devices were unsupported, you would not be seeing the symptoms you are seeing.  You'll need to go back to TAC as I'm betting EMC will need to get involved to look into the DFM server operation.

schm196 Tue, 09/14/2010 - 09:40
User Badges:

Hi Joseph -


Thanks for spending time trying to solve our issues.  Even though we didn't find out what exactly was wrong and how to solve the problem by fixing it, TAC seems to agree with you - their course of action at this time is to completely rebuild our LMS server.


Best regards,

Matthias

jackson.ku Mon, 10/25/2010 - 01:48
User Badges:

Hi,


I seems face the same problem with you, Can you please mention me how you resolve this problem?


Best Regards,


Jackson Ku

schm196 Mon, 10/25/2010 - 09:26
User Badges:

There appears to be a documentation problem for LMS 3.2 - it all came down to undocumented ports not being open in the Windows firewall.


The DFM engine seems to be performing its own connectivity testing during the discovery process.  It is expecting simple ICMP return packets, and while the LMS installation routine added dozens of ports to the Windows firewall configuration it did not add ports required for this type of traffic.  If you completely disable the Windows firewall (not an option for us) then the discovery works fine.


Took Cisco TAC two months to figure this out.

Joe Clarke Mon, 10/25/2010 - 09:42
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

Thanks for following up back to the community.  I was helping your engineer identify the root cause here.  Yes, ICMP was not being allowed back into the server.  DFM requires ICMP connectivity between itself and the management IP of each device before it will manage the device.

jackson.ku Mon, 10/25/2010 - 18:06
User Badges:

Hi,


I tried to disable windows firewall, it works... Before disable firewall, I add firewall inbound / outbound rule to allow snmp & snmp trap ( udp port 161, 162 ), but it did not work. Can you please tell me how to add inbound / outbound rule to allow the DFM work?


Best Regards,


Jackson Ku

Joe Clarke Mon, 10/25/2010 - 18:43
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

You need to allow IPv4 ICMP (i.e. ping traffic) on the inbound path.  That is, enable inbound ICMP echo replies.

jackson.ku Mon, 10/25/2010 - 22:38
User Badges:

Hi,


The ping between ciscoworks server and network device is ok, the windows firewall outbound rule is allow all, and I have allow udp port 161 & 162 in inbound rule. but I still fail to import. I installed WireShark at Ciscoworks server, I can see several icmp request / reply packets, but I can not see the following snmp query packets send from ciscoworks server. ( if disable windows firewall, I can see snmp query packets send from ciscoworks server, and import is success )


Best Regards,


Jackson Ku

Joe Clarke Mon, 10/25/2010 - 22:40
User Badges:
  • Cisco Employee,
  • Hall of Fame,

    Founding Member

You must add an explicit rule allowing in ICMP.  Windows ping will work, but DFM will not until you add this rule.

jackson.ku Tue, 10/26/2010 - 20:11
User Badges:

Hi Joseph,


Thanks for your help. The DFM work fine.


Best Regards,


Jackson Ku

Actions

This Discussion

Related Content