11-24-2009 06:57 AM
We get a lot of DFM environmental alerts like this:
But in real life the values are like this:
Temperature Voltage Current Tx Power Rx Power
Port (Celsius) (Volts) (mA) (dBm) (dBm)
---------- ----------- ------- -------- -------- --------
Te2/1/2 28.0 0.00 7.9 -- -2.2 -2.8
The thresholds for the interface are this:
High Alarm High Warn Low Warn Low Alarm
Temperature Threshold Threshold Threshold Threshold
Port (Celsius) (Celsius) (Celsius) (Celsius) (Celsius)
---------- ------------------ ---------- --------- --------- ---------
Te2/1/2 28.0 74.0 70.0 0.0 -4.0
So actually nothing on the TenGig interfaces DFM is telling us is true, and therefore we get a bunch of false alerts.
Is this a bug (, not correct) or setting mismatch anywhere, please help?
Solved! Go to Solution.
11-24-2009 09:29 AM
It looks like this is a VSS. In that case, I think you're seeing CSCta08882 which will require you to exclude the problematic entities from your SNMP view.
11-24-2009 09:29 AM
It looks like this is a VSS. In that case, I think you're seeing CSCta08882 which will require you to exclude the problematic entities from your SNMP view.
11-24-2009 10:58 PM
Yes, youre right it's VSS. So does this bug get solved in an update soon?
11-25-2009 10:02 AM
It's still waiting on a fix from EMC. An ETA is currently not available.
11-25-2009 10:54 PM
EMC ??, do you mean the storage supplier, or something else, and if you do what do they have to do with this?
11-25-2009 11:59 PM
Yes, EMC the storage company. They acquired Smarts who writes the backend device management and fault engine for DFM. The problem is with their engine, and we are awaiting a fix from them. As of now, a fix is slated to be in DFM 4.0 due out next summer.
11-26-2009 12:03 AM
And I pasted the wrong bug before. There are actually two very similar VSS bugs. The one concerning temperature problems is CSCta18610. The fix is the same in that EMC will need to provide it, but there is a slightly different workaround. The easiest solution is to unmanage the problematic sensor in DFM. However, in some cases, the temperature is high, but not a problem for the device. In that case, there is a more tactical workaround which can be done in DFM. I don't think this applies to you, though, because DFM is seeing a value of 280 C.
11-26-2009 12:59 AM
Hi Joe, thanks for clearing this up ;), but what you are actually saing is that I have to disable the temperature sensor element for each port?
This is way to time consuming to do this manually, because we are talking about thousends of sensors.
And on the other hand we would like to receive real environmental messages about VSS hardware.
Another thing is the quality of email messages we receive from DFM, its almost impossible to link each port to the sensor element in DFM.
For example this is the email notification:
EVENT ID = 000H760
ALERT ID = 00005GO
TIME = Tue 24-Nov-2009 13:33:55 CET
STATUS = Active
SEVERITY = Critical
MANAGED OBJECT = switch1
MANAGED OBJECT TYPE = Switches and Hubs
EVENT DESCRIPTION = OutOfRange::Component=TEMP-switch1/6051 [Te2/5/4 Module Temperature Sensor-TenGigabitEthernet2/5/4 Module Temperature Sensor];ComponentClass=TemperatureSensor;ComponentEventCode=1079;Status=OK;entSensorValue=280;CurrentValue=280.0
CUSTOMER IDENTIFICATION = All devices
CUSTOMER REVISION = 1
Where 6051 is the element name linked to a specific port TenGigabitEthernet2/5/4, it's just not logical in my view to use different names for basically the same thing.
The email message is not cristal clear in one view what exactly is the problem, not only for this specific issue but for all email alerts we get from DFM.
You always have to put a lot of effort and time in it to see what is the problem and what could have cause this.
I wished we could actually save time using LMS, not put al lot of needless time in it.
Is there another way to clear up this problem and the millions of false email notification messages from DFM (patch or update)?
11-26-2009 01:00 AM
Hi Joe, thanks for clearing this up ;), but what you are actually saing is that I have to disable the temperature sensor element for each port?
This is way to time consuming to do this manually, because we are talking about thousends of sensors.
Ofcourse we can use the bulk manage/unmanage method for this, but on the other hand we would like to receive real environmental messages about VSS hardware.
Another thing is the quality of email messages we receive from DFM, its almost impossible to link each port to the sensor element in DFM.
For example this is the email notification:
EVENT ID = 000H760
ALERT ID = 00005GO
TIME = Tue 24-Nov-2009 13:33:55 CET
STATUS = Active
SEVERITY = Critical
MANAGED OBJECT = switch1
MANAGED OBJECT TYPE = Switches and Hubs
EVENT DESCRIPTION = OutOfRange::Component=TEMP-switch1/6051 [Te2/5/4 Module Temperature Sensor-TenGigabitEthernet2/5/4 Module Temperature Sensor];ComponentClass=TemperatureSensor;ComponentEventCode=1079;Status=OK;entSensorValue=280;CurrentValue=280.0
CUSTOMER IDENTIFICATION = All devices
CUSTOMER REVISION = 1
Where 6051 is the element name linked to a specific port TenGigabitEthernet2/5/4, it's just not logical in my view to use different names for basically the same thing.
The email message is not cristal clear in one view what exactly is the problem, not only for this specific issue but for all email alerts we get from DFM.
You always have to put a lot of effort and time in it to see what is the problem and what could have cause this.
I wished we could actually save time using LMS, not put al lot of needless time in it.
Is there another way to clear up this problem and the millions of false email notification messages from DFM (patch or update)?
11-26-2009 08:50 AM
Unfortunately, not. This issue is not yet resolved, and the only workaround in your case is to unamage each bogus sensor.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: