Ron,

ronshuster · ‎12-07-2015

Hi there,

I am currently using Solarwinds to monitor a network and it provides visitbility to interfaces being up\down, high cpu threash-holds, high throughput, etc. However I'm looking for more granular info such :

1) CRC errors on the interface

2) packet loss

3) icmp packet loss from my router router to a given destination (even multiple destinations)

And obviously I need to be alerted by email.

I had an issue a few days ago where one of my WAN circuits were "misbehaving" yet my ASR CPE router was in perfectly working order, I therefore was unable to proactively determine that there is a problem on the WAN.

Is that something EEM can help me out with?

Pls assist

Thanks!

Jason Kopacko · ‎12-07-2015

Ron,

I'm actually working on a pretty in depth EEM now to monitor interfaces for such issues. What router, IOS, and EEM version are you running?

ronshuster · ‎12-07-2015

ASR1001

IOS-XE

System image file is "bootflash:asr1001-universalk9.03.10.04.S.153-3.S4-ext.bin"

Router#sh event manager version
Embedded Event Manager Version 4.00
Component Versions:
eem: (rel5)1.0.0
eem-gold: (rel1)1.0.2
eem-call-home: (rel2)1.0.3

asigachev · ‎12-09-2015

With CRC errors you can write script that gets launched under various conditions, such as when they start to grow, when you have more than certain amount of them or certain rate (amount during time). Check event interface configuration in this doc http://www.cisco.com/c/en/us/td/docs/ios/netmgmt/command/reference/nm_book/nm_05.html#wp1180297 It has two parameters related to CRC errors to monitor and react on:

•input_errors_crc—Number of packets with a CRC generated by the originating LAN station or remote device that do not match the checksum calculated from the data received.

•input_errors_frame—Number of packets received incorrectly that have a CRC error and a noninteger number of octets.

For ICMP packet loss from router to some destination there is an event ip sla

http://www.cisco.com/c/en/us/td/docs/ios/netmgmt/command/reference/nm_book/nm_05.html#wp1177337

It allows you to monitor and react on various IP SLA parameters.

As with a general packet loss, it depends on your situation. If you monitor some WAN channel and you manage routers on both sides you can set up IP SLA monitor with responder and monitor packet loss on it. You can also use event interface input_packets_dropped and output_packets_dropped variables, but these are only drops on your interface, not in the channel.

Netflow has some configurations that allow you to monitor packet loss based on some information in TCP packets, and/or some Application Layer Data.

Jason Kopacko · ‎12-09-2015

Ron,

The EEM I am writing would monitor for Interface errors that will have multiple events like below:

 event tag E1 interface name GigabitEthernet0/0 parameter input_errors entry-op ge entry-val 2 entry-type increment poll-interval 30
 event tag E2 interface name GigabitEthernet0/0 parameter output_errors entry-op ge entry-val 2 entry-type increment poll-interval 30

That would generate a syslog message that other EEMs are monitoring for and kick-off based on that message and check the Interface and generate a full syslog message of information using something like this:

sh int g0/0 | i line pro|Hardware|BW|reliab|media type|Last clear|bits/sec|bytes|runts|error|pause|collision|drop|fail

The results would look like:

Router#
GigabitEthernet0/0 is up, line protocol is up
 Hardware is iGbE, address is xxxx.xxxx.xxxx (bia xxxx.xxxx.xxxx)
 MTU 1500 bytes, BW 250000 Kbit/sec, DLY 10 usec,
 reliability 255/255, txload 76/255, rxload 26/255
 Full Duplex, 1Gbps, media type is SX
 Last clearing of "show interface" counters never
 Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 5748690
 Output queue: 0/1000/5747487 (size/max total/drops)
 30 second input rate 26355000 bits/sec, 6070 packets/sec
 30 second output rate 74741000 bits/sec, 10521 packets/sec
 5515532105 packets input, 3508452199238 bytes, 0 no buffer
 0 runts, 0 giants, 0 throttles
 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
 0 watchdog, 0 multicast, 0 pause input
 8701025128 packets output, 6871856368883 bytes, 0 underruns
 0 output errors, 0 collisions, 1 interface resets
 0 unknown protocol drops
 0 babbles, 0 late collision, 0 deferred
 2 lost carrier, 0 no carrier, 0 pause output
 0 output buffer failures, 0 output buffers swapped out

Then the EEM would regexp those lines into specific variables and generate targeted syslog messages that my Syslog servers would send an email alert to me based off those messages.

Also, since I use BGP, I am also going to pull this:

sh ip bgp nei | i neighbor is|BGP state|Last read|Route map|Prefixes C|Prefixes T|Address tr|Connections e|host:|RTT|uptime:|IP Pre|Rcvd:|Sent:

And the results would look like:

BGP neighbor is x.x.x.x, remote AS 12345, external link
 BGP state = Established, up for 2w3d
 Last read 00:00:09, last write 00:00:22, hold time is 180, keepalive interval is 60 seconds
 Route map for incoming advertisements is route-map-IN
 Route map for outgoing advertisements is route-map-OUT
 Prefixes Current: 24 72 (Consumes 7600 bytes)
 Prefixes Total: 691 98
 Address tracking is enabled, the RIB does have a route to x.x.x.x
 Connections established 1; dropped 0
Local host: x.x.x.x, Local port: 28420
Foreign host: x.x.x.x, Foreign port: 179
SRTT: 1000 ms, RTTO: 1003 ms, RTV: 3 ms, KRTT: 0 ms
minRTT: 0 ms, maxRTT: 1000 ms, ACK hold: 200 ms
uptime: 1519229228 ms, Sent idletime: 9588 ms, Receive idletime: 9788 ms
IP Precedence value : 6
Rcvd: 53415 (out of order: 0), with data: 25462, total data bytes: 492754
Sent: 53421 (retransmit: 1, fastretransmit: 0, partialack: 0, Second Congestion: 0), with data: 28062, total data bytes: 555940

Then the EEM would regexp those lines into specific variables and generate targeted syslog messages that my Syslog servers would send an email alert to me based off those messages.

How to identify poor performance on the WAN