Where is the list of alerts / events that will be triggered by the UCS infrastructure ?

Unanswered Question
Mar 21st, 2010
User Badges:

Hi All


I've been frantically trying to find a list of all the events, alerts, alarms that will be triggered by the UCS infrastructure over the last few days.


But to no avail, as this information doesn't seem to be readily available on the cisco website.


(5108 Chassis, Half size blades, IO modules, 6120 fabric interconnects, etc)


What i'm really looking for is a list of the SNMP and syslog events that will be triggered. Categorization of their criticality levels would be a massive bonus.


Really looking at something like this :  http://www.veeam.com/nworks/overview/data/collectorEvents.html


Thanks in advance for your time.


Kindest Regards


Phil

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (3 ratings)
Loading.
phil_vell Wed, 04/14/2010 - 18:07
User Badges:

Hi Adam,


Thankyou for your reply.


Unfortunately, my requirements were more indepth, as it involved knowing what SNMP traps would be sent (6120's utilise nexus 5000 mibs), and also what syslog events would be triggered.


As a result, i was looking more for something along the lines of http://www-europe.cisco.com/en/US/docs/switches/datacenter/sw/system_messages/reference/sl_nxos_book.html


in terms of the level of detail needed.


Unfortunately, i've chased this up with Cisco SE's and the UCS Manager does not currently SNMP trap for the UCS Chassis (Fans, Power). Though this is expected in a upcoming version to be released in June.

As a result of this, i need to have a look at the syslog events that will be generated by the UCS Manager application, and form the regex' that will be used to raise events for the important ones, unfortunately this is not documented, and is also expected around the time of the next release.


It would be great to know how you currently handle the monitoring / alerting of the compute infrastructure, i understand that the UCS Manager application will display these events, but in terms of automatic ticket generation / event correlation, this seems unachievable at the moment.



Kindest Regards


Philip Vella

adambaum1 Thu, 04/15/2010 - 07:05
User Badges:

HI Phil,


We don't currently monitor any UCS components since we haven't purchased any of them.  My shop is in currently evaluating blade systems so we immersed ourselves fairly well into all the documentation.  Good to know that the chassis doesn't alarm just yet.


adam

simon.geary Thu, 04/15/2010 - 07:29
User Badges:

I'm using the Microsoft SCOM management pack for monitoring which gets it's information over http. I couldn't find a way to export all the different alerts it produces but there are plenty of them, covering hardware failures, down links, discoveries etc. and even errors at the VMware level if you have Microsoft SCVMM. Might be useful for you.


http://www.cisco.com/en/US/docs/unified_computing/ucs/sw/scom/quick/start/guide/ucsMPQS.html


Here is a sample alert.




Date and Time:4/15/2010 10:48:00 AM
Property NameProperty Value
IsEmptyfalse
Affectedsys/chassis-1/psu-2
Ackno
Causeequipment-offline
CodeF0522
Created2010-03-30T15:26:53.476
DescriptionPower supply 2 in chassis 1 power: off
Monikersys/chassis-1/psu-2/fault-F0522
Id368048
Ruleequipment-psu-offline
Severity1
Tagnetwork,server
Typeenvironmental
phil_vell Mon, 04/26/2010 - 18:57
User Badges:

Hi Simon,


Thankyou for your reply.


Are you currently capturing the syslog messages captured by your platform ?


In the environment in which i'm working i don't have capability other than syslog or snmp. I've been told that syslog will be more useful and have coverage outside the 6120's, but i just need to validate.


If anyone had any useful syslog messages it would be great. Really looking for things like :

  • 5108 Power tray addition / removal
  • 5108 Fan tray addition / removal
  • 5108 Fan issues
  • UCS Manager Active / Standby role changes
  • anything from the 2104


Examples would be greatly appreciated.


Kindest Regards


Philip

simon.geary Mon, 05/03/2010 - 11:39
User Badges:

We actually ended up pulling the plug on SCOM. It was hitting the management port so much it was freezing the http port and requiring reboots of the interconnect.

Actions

This Discussion

Related Content