cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1363
Views
0
Helpful
9
Replies

LMS 3.2 DFM Alerts don't match up with real life

orsonjoon
Level 1
Level 1

We get a lot of DFM environmental alerts like this:

EventID: 000H769
PropertyValue
ComponentTEMP-switch1/7030 [Te2/1/2 Module Temperature Sensor-TenGigabitEthernet2/1/2 Module Temperature Sensor]
ComponentClassTemperatureSensor
ComponentEventCode1079
StatusOK
entSensorValue280
CurrentValue280.0 DEGC
RelativeTemperatureThreshold10.0 %
HighThreshold45.0 DEGC

But in real life the values are like this:

            Temperature  Voltage  Current   Tx Power  Rx Power
Port        (Celsius)    (Volts)  (mA)      (dBm)     (dBm)
----------  -----------       -------       --------       --------       --------
Te2/1/2       28.0       0.00       7.9 --        -2.2      -2.8 

The thresholds for the interface are this:


                                      High Alarm  High Warn  Low Warn   Low Alarm
           Temperature         Threshold   Threshold  Threshold  Threshold
Port       (Celsius)              (Celsius)   (Celsius)  (Celsius)  (Celsius)
----------      ------------------       ----------       ---------       ---------  ---------
Te2/1/2      28.0                     74.0        70.0         0.0       -4.0

So actually nothing on the TenGig interfaces DFM is telling us is true, and therefore we get a bunch of false alerts.

Is this a bug (, not correct) or setting mismatch anywhere, please help?

1 Accepted Solution

Accepted Solutions

Joe Clarke
Cisco Employee
Cisco Employee

It looks like this is a VSS.  In that case, I think you're seeing CSCta08882 which will require you to exclude the problematic entities from your SNMP view.


View solution in original post

9 Replies 9

Joe Clarke
Cisco Employee
Cisco Employee

It looks like this is a VSS.  In that case, I think you're seeing CSCta08882 which will require you to exclude the problematic entities from your SNMP view.


Yes, youre right it's VSS. So does this bug get solved in an update soon?

It's still waiting on a fix from EMC.  An ETA is currently not available.

EMC ??, do you mean the storage supplier, or something else, and if you do what do they have to do with this?

Yes, EMC the storage company.  They acquired Smarts who writes the backend device management and fault engine for DFM.  The problem is with their engine, and we are awaiting a fix from them.  As of now, a fix is slated to be in DFM 4.0 due out next summer.

And I pasted the wrong bug before.  There are actually two very similar VSS bugs.  The one concerning temperature problems is CSCta18610.  The fix is the same in that EMC will need to provide it, but there is a slightly different workaround.  The easiest solution is to unmanage the problematic sensor in DFM.  However, in some cases, the temperature is high, but not a problem for the device.  In that case, there is a more tactical workaround which can be done in DFM.  I don't think this applies to you, though, because DFM is seeing a value of 280 C.

Hi Joe, thanks for clearing this up ;), but what you are actually saing is that I have to disable the temperature sensor element for each port?

This is way to time consuming to do this manually, because we are talking about thousends of sensors.

And on the other hand we would like to receive real environmental messages about VSS hardware.

DFM.jpg

Another thing is the quality of email messages we receive from DFM, its almost impossible to link each port to the sensor element in DFM.

For example this is the email notification:

EVENT ID = 000H760

ALERT ID = 00005GO

TIME = Tue 24-Nov-2009 13:33:55 CET

STATUS = Active

SEVERITY = Critical

MANAGED OBJECT = switch1

MANAGED OBJECT TYPE = Switches and Hubs

EVENT DESCRIPTION = OutOfRange::Component=TEMP-switch1/6051 [Te2/5/4 Module Temperature Sensor-TenGigabitEthernet2/5/4 Module Temperature Sensor];ComponentClass=TemperatureSensor;ComponentEventCode=1079;Status=OK;entSensorValue=280;CurrentValue=280.0

CUSTOMER IDENTIFICATION = All devices

CUSTOMER REVISION = 1

Where 6051 is the element name linked to a specific port TenGigabitEthernet2/5/4, it's just not logical in my view to use different names for basically the same thing.

The email message is not cristal clear in one view what exactly is the problem, not only for this specific issue but for all email alerts we get from DFM.

You always have to put a lot of effort and time in it to see what is the problem and what could have cause this.

I wished we could actually save time using LMS, not put al lot of needless time in it.

Is there another way to clear up this problem and the millions of false email notification messages from DFM (patch or update)?

Hi Joe, thanks for clearing this up ;), but what you are actually saing is that I have to disable the temperature sensor element for each port?

This is way to time consuming to do this manually, because we are talking about thousends of sensors.

Ofcourse we can use the bulk manage/unmanage method for this, but on the other hand we would like to receive real environmental messages about VSS hardware.

DFM.jpg

Another thing is the quality of email messages we receive from DFM, its almost impossible to link each port to the sensor element in DFM.

For example this is the email notification:

EVENT ID = 000H760

ALERT ID = 00005GO

TIME = Tue 24-Nov-2009 13:33:55 CET

STATUS = Active

SEVERITY = Critical

MANAGED OBJECT = switch1

MANAGED OBJECT TYPE = Switches and Hubs

EVENT DESCRIPTION = OutOfRange::Component=TEMP-switch1/6051 [Te2/5/4 Module Temperature Sensor-TenGigabitEthernet2/5/4 Module Temperature Sensor];ComponentClass=TemperatureSensor;ComponentEventCode=1079;Status=OK;entSensorValue=280;CurrentValue=280.0

CUSTOMER IDENTIFICATION = All devices

CUSTOMER REVISION = 1

Where 6051 is the element name linked to a specific port TenGigabitEthernet2/5/4, it's just not logical in my view to use different names for basically the same thing.

The email message is not cristal clear in one view what exactly is the problem, not only for this specific issue but for all email alerts we get from DFM.

You always have to put a lot of effort and time in it to see what is the problem and what could have cause this.

I wished we could actually save time using LMS, not put al lot of needless time in it.

Is there another way to clear up this problem and the millions of false email notification messages from DFM (patch or update)?

Unfortunately, not.  This issue is not yet resolved, and the only workaround in your case is to unamage each bogus sensor.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: