Switch memory issues since upgrading to DFM 2.0.13

Unanswered Question
Aug 7th, 2009

Hi,

We upgraded DFM to 2.0.13 last week and after completing its weekly rediscovery is now reporting issues with memory on several switches.

DFM is reporting that 3 Cisco 2950's have less than 15% CPU memory available. It is also reporting that 2 Cisco 2948G-GE-TX switches have fragmented memory.

We have done a cold reboot on two of the Cisco 2950 switches and the problem remains in DFM.

Is this a bug or is DFM now reporting more accurately on devices that have had a problem all along ?

We are trying to check the memory utilisation directly on the switch through the CLI but have yet to reach any conclusions.

Any help would be appreciated.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Joe Clarke Fri, 08/07/2009 - 07:44

Post an SNMP Walk of the ciscoMemoryPoolMIB from one affected device.

cbeswick Mon, 08/10/2009 - 00:27

Hi,

I have attached an snmp walk from Ciscoworks device centre for one of the 2950s with high memory utilisation.

One thing we have noticed is that some of our Cisco 2950G-48's have more memory than others. Out of the 7 we have, 2 have 5003488 total processor memory and the 5 that DFM is reporting errors on only have 3891296 - the same amount as a normal 24 port 2950. Its almost as if they have been shipped with insufficient memory to handle the 48 ports.

The three 2948G-GE-TX switches with the excessive fragmentation error continue to elude us as the only pattern we can find is that they are the same model type.

Attachment: 
Joe Clarke Tue, 08/11/2009 - 09:08

The memory fragmentation does appear to be valid here for the CLUSTER and MALLOC pools. However, I don't know if that is really an issue. What is the sysObjectID from this switch?

Joe Clarke Mon, 08/10/2009 - 08:42

These numbers don't look bad. Memory pools are ~ 72% and 85% contiguous respectively. What is your fragmentation threshold for the Switches and Hubs device group?

Joe Clarke Mon, 08/10/2009 - 23:27

This is the default threshold. I've tried to reproduce this locally with a 2950, but could not. You may have found a new bug, but you'd have to contact TAC so EMC could have a look at the polling data. As a workaround try removing one of the problem switches from DFM, then re-add it, and see if this event clears.

cbeswick Mon, 08/10/2009 - 23:34

We have already tried this, we have also rebooted the switches to see if the problem is switch related, but DFM still reports the same switches with insufficient free memory and excessive memory fragmentation.

Joe Clarke Tue, 08/11/2009 - 09:01

I don't see how either of these thresholds could be violated based on these numbers. What is the sysObjectID of this particular 2950? Post the details of the events you're seeing from this switch.

cbeswick Wed, 08/12/2009 - 23:43

The ojbect ID is .1.3.6.1.4.1.9.1.429

We arent seeing anything in syslog.

I have attached a screenshot of the DFM alert.

I have also looked at the memory on the switch:

ci_t2_cctv_sw2#sh mem

Head Total(b) Used(b) Free(b) Lowest(b) Largest(b)

Processor 80C86740 3885248 3404744 480504 0 374544

I/O A0A723A0 2179968 1447900 732068 549064 611676

The figures just do not add up. If the total Processor memory is 3885248 and the used memory is 3404744 then this gives us a 87.63% utilisation. This is obivously below the 15% memory free threshold, but certainly isnt 9.8% free as reported by DFM.

Maybe the threshold is being breached, but DFM isn't correctly reporting on the memory.

Attachment: 
Joe Clarke Thu, 08/13/2009 - 04:47

I checked, and this sysObjectID does appear to be correctly instrumented in DFM. Additionally, from the show mem output, this switch currently has only ~ 12% free memory, so conceivably the event was correct at the time it was generated. Certainly, the value DFM has for total processor memory is correct. I would need to have seen the SNMP Walk or a sniffer trace taken around the time of the event to know for certain. But in any event, the event should still be active for this switch given a 15% threshold.

If you are not experiencing any problems with switch functionality, maybe you should decrease your threshold to 10% or maybe 5-8% to avoid false positive events.

Actions

This Discussion