I just found a device with a crashed primary SUP that sent a "SNMP-5-WARMSTART:Warm Start Trap", but the device didn't get posted in the Syslog Reloads report. Why're warmstarts not reported by default? Isn't this an oversight?
Maybe. This message is not part of the standard reloads report filter list. However, one of the problems we see with the syslog-based reloads report is network convergence. Since UDP is best-effort, some syslog messages are lost in the ether, and never make it to the LMS server (e.g. they are generated before routing has initialized on the device).
Do you see this WARMSTART message in the LMS server's syslog.log (syslog_info)?
Yes, it made it to the syslog server. To rephrase, my question was "Why isn't WARMSTART included in the Reload Report out-of-the-box?"
The problem you refer to is IMO an IOS/CatOS design flaw with regards to syslog generation. The OS should wait until the IP stack and routing protocol are fully initialized, or the syslogs have no chance to go out. It feels as though 90% of the syslogs concerning reloads fail to get delievered due to this flaw. That's why I was surprised this one made it out, yet still didn't show up in the Reload Report.
The WARMSTART message is not included in the reloads report because it can be sent when the device has not reloaded. That is, if you perform the following, a WAMRSTART message will be sent:
Router(config)#snmp-server community public RO
This would result in a false positive. We there no other messages generated by this device that would indicate a reload?
That's an interesting bit to know. Is that a bug or a feature?
There're other syslogs that point to a crash of the SUP in slot 1, but none more obvious than the WARMSTART itself:
SYS-1-SYS_ENABLEPS: Power supply 2 enabled
SYS-5-SUP_HASWOVER: Supervisor becoming active(HA switchover)
SPANTREE-5-ROOTCHANGE: Root change for Vlan/Instace ###: New root port #/#. New Root mac addr is xx-xx-xx-xx-xx-xx
ETHC-5-PORTFROMSTP: Port 1/1 left bridge port 1/1-2, 2/1-2
SYS-1-SYS_CRASHINFO_LESS_SPACE: May not have enough space to save the crash info on bootflash of Switch on module: 2
SYS-5-MOD_OK: Module # (Module serial) is online
SNMP-5-WARMSTART: Warm Start Trap
The switch shows uptime of 90 days, so strictly speaking it did not reload, but the WARMSTART should put it on the map since it's running on one leg now.
There is no bug on this since RME is working as designed. While adding the message to the report would be trivial, it would result in false positives. To that end, we do have the COLDSTART message in the list of messages.
If you're willing to deal with the potential false positives, go to RME > Reports > Custom Report Templates, and modify the Reload Report to add the WARMSTART message.
The bug or feature question was referring to the fact the IOS would send a WARMSTART syslog just because someone configures a couple lines of "snmp-server".
I don't think it's a good decision to minimize false positives this way. After way, we lived with CSCdy02471 (32-bit sysUpTime rollover) in LMS 2.2 and prior. If given a choice of continuing to get bogus reloads report, and the current situation of missing real reloads (syslogs getting lost in ether, or WARMSTART not being reported out-of-the-box), I'd much prefer the former.
IOS is working as designed and this is a correct design. When you restart the SNMP process (which is what those commands do) a warmstart should be registered. Really, the device should be sending a syslog message that indicates a definitive restart of its CPU. I have a feeling a message is generated, but it is only seen in the logging buffer on the device.
While you might find an out-of-the-box WARMSTART in the reload report a good thing, others may not. I can certainly see arguments on both sides. I agree with one of your earlier posts that an ideal solution to this would be to batch up syslog messages, and wait for the network to be up before sending them. While this is doable today with ESM, it is not available on all platforms, and it is not a default config.
My advice would be to follow up on this problem with your account team, and build a business case for the ideal solution.
ESM would make a better tool here since it could "cache" the messages, and send them out when the syslog server is reachable. Of course, a definitive RELOAD message would need to be generated by the device first.
There's no syslog created pertaining to a CPU restart. There's a "SNMP-5-MODULETRAP:Module 1 [Down] Trap" immediately after the WARMSTART, however.