×

Warning message

  • Cisco Support Forums is in Read Only mode while the site is being migrated.
  • Cisco Support Forums is in Read Only mode while the site is being migrated.

Cisco 877 crashing?

Answered Question
Oct 29th, 2010
User Badges:

Hi,


I am having some issues with my cisco 877 as of late; it is hooked up to my adsl2+ line and also serves as a DHCP server internally. The only two things connected to it ethernet wise are a linksys wireless access point and an eight port gigabit switch to which all the LAN devices are connected.


For the past week or so it seems to be crashing at random; I cannot ping the router, and if I am watching the console I see it start back up. Upon logging in again and running 'show hardware' the last reload reason is unknown. I have tried to diagnose the problem myself but I don't know what to try any more - nothing seems to be wrong with the configuration, and there does not seem to be any special circumstances which cause the problem.


I have included a few files, including my running-config, so if anybody has any ideas to diagnose this further that would be great!


Thanks,

Aaron Trout


EDIT:


Also just noticed some entries in the syslog [attached] which seem quite interesting; NAT seems to be hogging the cpu around the time of a crash, not too sure how to go about diagnosing / rectifying this. I could put a machine in between the router and the rest of the network and use that as a gateway / NAT box, but that seems impractical, and is working around the problem rather than fixing it. Any ideas?

Correct Answer by David Aicher about 6 years 9 months ago

Yes the 1700 is more powerful.  It may also be that the nat translations are stable in the network with the 1700 while the translations with Vuze "churn" more.  Bittorrent tends to generate a lot of sessions that come and go quickly.  It will also use up your available bandwidth.


You can limit the total translations or the translations per host using the "ip nat translations max-entries" command.


http://www.cisco.com/en/US/docs/ios/ipaddr/command/reference/iad_nat.html#wp1075721


Hope this helps


Dave Aicher

  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 5 (5 ratings)
Loading.
David Aicher Fri, 10/29/2010 - 07:12
User Badges:
  • Cisco Employee,

# Oct 29 08:15:07 10.0.10.1 94: cyberdyneSystems: Oct 29 07:15:17.863:  %SYS-2-WATCHDOG: Process aborted on watchdog timeout, process = IP NAT  Ager.


This is what is causing the reload.  Ip nat ager is invoked to clean up the stale nat translations.  These would be expired translations that are no longer in use.  For the router to continuously churn on ip nat ager until you see a watchdog would indicate a large number of nat translations are being generated and then expire.


I would capture the output of "show ip nat statistics" and "show ip nat translation".   877 is typically small business router so there should not be 100s of users.  Nat translations should be fairly low I would not expect more than 1000 although it could spike up.   Scan through the output of "show ip nat translation" and look for a host generating the most translations.   


By default the timeout for idle TCP sessions is 24 hours.  You may have to lower this timeout if you see a lot of tcp sessions  in the output.  UDP is 5 minutes, DNS and ICMP have a 60 second timeout.


It is not uncommon to see a rogue device or virus in the network generate thousands of translations.  This is just one cause for this type of problem.  Expecially if the device is sending ICMP or DNS requests which would timeout quickly after the requests were sent.


Regards

Dave Aicher

aarontrout Fri, 10/29/2010 - 08:31
User Badges:

Thanks, this was very helpful. It restarted again about half hour ago, so after it came back up I analysed the situation on the NAT front. There were a couple thousand active connections, a good 60 - 70% of which were coming from a single machine. At the moment we believe it was Vuze running on the machine, making a huge amount of connections.


After turning the vuze off, and clearing the current translations (and waiting 5 or 10 minutes) we are now at about 250 connections. I will ask the user to keep vuze turned off and see if the router becomes more stable.


A friend of mine who uses a cisco 1700 router with an ADSL WIC at his house just had a look at the nat stats and it showed nearly 5000 connections, yet the router has been online for a good few months (and the last reload was probably from the reload command)! I assume that the 800 series are not designed to handle as many connections as other routers like the 1700?


I will continue to monitor the situation and report back, but thanks again for your input, it was very useful!


Aaron

Correct Answer
David Aicher Fri, 10/29/2010 - 08:55
User Badges:
  • Cisco Employee,

Yes the 1700 is more powerful.  It may also be that the nat translations are stable in the network with the 1700 while the translations with Vuze "churn" more.  Bittorrent tends to generate a lot of sessions that come and go quickly.  It will also use up your available bandwidth.


You can limit the total translations or the translations per host using the "ip nat translations max-entries" command.


http://www.cisco.com/en/US/docs/ios/ipaddr/command/reference/iad_nat.html#wp1075721


Hope this helps


Dave Aicher

aarontrout Fri, 10/29/2010 - 13:27
User Badges:

Still having problems, it just crashed again! What do you think are sensible values for the nat translation time out and max translations per host? I have just set it to max 250 translations per host, 2500 max translations, and 2 hour time out.


Thanks

Aaron



UPDATE: Also as a consequence of this, there are a lot of VFR table overflows; would it be better for me to turn VFR off?

David Aicher Sat, 10/30/2010 - 05:15
User Badges:
  • Cisco Employee,

those values seem pretty good although it may vary a lot.   There is probably not one "best" value it will be more or less educated guess.


I would say if you continue to see crashes with the max entries in place you need to open a case with TAC to investigate more in depth.


Dave Aicher

aarontrout Sun, 10/31/2010 - 04:47
User Badges:

It seems to be stable since I applied those limits (I have raised the per host to 500 and it still seems to be fine). I can also now tell which IP address is creating loads of translations at a glance, which is useful. It is usually one host that uses all 500, and everybody else is using less than 25!


Thanks very much for your help, I will now mark this as answered.


Aaron

Actions

This Discussion