I have been troubleshooting this problem for 8 hours and have not figured it out yet.
Basically, I have a 2811 where the FA0/0 keeps dropping packets at the interface now and
then seemingly when traffic tends to burst but anything over 5 megs and the errors seem to
increase. The issue is not severe enough to affect users yet but that could change as
more users generate more traffic.
Here is the interface configuration:
- 100 mbits/sec full duplex
- MTU is 1375
- GRE Tunnels are used at this interface
- No access-lists
- Proxy Arp is enabled
- Local Proxy Arp is disabled
- Fast switching is enabled
- IP CEF is enabled
- No CRC errors
- Ignore Errors increment at the same time as Throttle Errors
- Input queue: drops increment
- The input queue seems to increase to say 55 or higher and that's when we see the drops
The CPU also skyrockets at this time hitting 99%. I had a through look at the processes
show proc cpu | exclude 0.00
The processes that increase dramatically are:
- IP Input
- Pool Manager
When the CPU spikes, IP INPUT is about 60% and Pool Manager around 25 - 35% plus the rest
of the processes.
If you suspect it's some type of bursty traffic, that's fine but all SNMP stats show a
maximum rate of 6 mbits/sec - a far cry from what this router can handle.
Is it possible that there is some type of traffic affecting the router? Applications that
are running: E-mail, voice, video (OCS), ftp, mapped drives, www
The IOS is:
I reviewed all release notes / bugs and can't find anything that could be the cause.
Is it possible that the router is process-switching and slow to a crawl CPU wise? But
even if it was process-switching, we're talking about a max. of 6 mbits/sec. Besides, I
see CEF working properly (so I think anyways.)
Can you post your running configuration here? Also, post the output of "show
interface "interface id"" command.
I can but it's a pain as it's a classified system on a closed - the config has to be vetted / IP addresses changed, etc. before submitting.
Well, the MTU is low because there's actually 2 Tunnels - there's the the GRE Tunnel but inside the GRE Tunnel is another military type grade tunnel.. we had issues with fragmentation (ICMP Discovery)
I thought about that... but why would ... hmmm... maybe if there's a packet with DF set that's over 1375? Possible...
Here is a link that I have used in the past that was helpful.
There is no reason to lower the MTU on the "outer" interface just because
it' is the source/destination for an int tunnel; tunnel mode gre.
If the traffic is not vpn and udp,
i strongly recommand to use "ip tcp adjust-mss 1024" or lower
to enable the routers to catch the tcp-handshake
and inject values not leading to fragmentation.
Configure this on both incomign, outgoing and tunnel-interface(s).
And you may also want to clear the df-bit on the tunneld traffic to be able to transport
it (even if it gets fragmented) in spite of the df-bit. This normally
is done by a (cpu-intensive) route-map.
BTW, are you sure CEF is running ?
(sh ip cef)
sometimes it gets powered-off silently by out-of-memory condition.
Yes, it's not a duplex / speed issue and I have the statisitics for CPU processes in my original post.
We may look at adjusting the MSS to 1024 or lower. And I do agree, clearing the DF bit may be a good idea.
CEF is running.
Thank you for the suggestion.
have you tried to do ip flow ingres; ip flow egres
on the fas0/0 interface to have a look with sh ip cache flow on the traffic coming in,
to be able to detect unwanted ip traffic ,
and also get some nice staistics about protocol and paketsize distribution.
show ip traffic shows also some hints on why pakets got dropped.
Hope this help's locating the problem.
I'll add those right now and try .. Thanks.
It's towards the end of the day here so traffic has gone down so the errors are not happening anymore. However, I decided to look at debug ip icmp and I am getting:
ICMP: dst (188.8.131.52) frag. needed and DF set unreachable sent to 184.108.40.206
Maybe I get a lot of those during the day and they get dropped incrementing those counters.
Have you found out to which process the cpu load is related
(show proc cpu sorted) ?
Have you also checked wether the switch-port also
believes in 100-full ?