Software forced crash

Unanswered Question
Sep 28th, 2008

Hi,

um, how can I ask to you...I just explain our problem first.

We are using 7200 VXR for PDSN but this router caused reloaded by himself.

This is message,"System was restarted by error - a Software forced crash, PC 0x60A00A04" and the IOS version is " 12.3(14)YX4"

One more thing, " %SYS-3-CPUHOG: Task is running for (104004)msecs, more than (2000)msecs (0/0),process = AAA ACCT Proc." What does mean of this error message? Is the main problem come to CPU?

I also attached our " show tech-support"

Plz,, give me solution...



  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 0 (0 ratings)
Loading.
Giuseppe Larosa Sun, 09/28/2008 - 03:47

Hello Yongsik,


the log message means that a process AAA ACCT Proc has used the cpu for 104 seconds more then 2000 msec.

This is probably the reason of the software crash the system decided to reload because this process was using the cpu for too long: 104 seconds are like 10 thousands years for modern cpus.


we can see:

Last reset from watchdog reset


the watchdog process is the one that looks for cpu usage this confirms what I've written above.


In the log you can see several cpuhog messages.

near the software reload you can see


AAA/ACCT(0145C159): Accouting method=AAA (RADIUS)

AAA/ATTR(0145C159): cursor init: 65653348 64532294 none unknown

AAA/ATTR(0145C159): find: 64532324 0 00000009 username(345) 3 wap

AAA/SG: Server group ref count for public group AAA raised to 25

AAA/SG: Server group wrapper ref count for public group AAA raised to 25


AAA/ACCT is accounting.

there were 146 concurrent CDMA users when the system reloaded.


AAA/ACCT is the process that uses most cpu:


61 0 701290344 0 6980 1278432 1278432 AAA ACCT Proc

62 0 1772314480 0 6980 0 0 ACCT Periodic Pr


You may have hit a bug:


CPUHOG at the Time of Normal Router Operation


Most of the time, these error messages are due to an internal software bug in the Cisco IOS Software.


The first step to troubleshoot this sort of error message is to look for a known bug. You can use the Bug Toolkit ( registered customers only) to find a bug that matches the error. In the Bug Toolkit page, click Launch Bug Toolkit, and select Search for Cisco IOS-related bugs. In order to narrow your search, you can select your Cisco IOS software version under number 1. Under number 3, you can perform a keyword search for "CPUHOG, " where process is the corresponding process, such as Virtual Exec or IP Input.


You can upgrade to the latest Cisco IOS Software image in your release train to eliminate all fixed CPUHOG bugs.


see

http://www.cisco.com/en/US/products/hw/iad/ps397/products_tech_note09186a00800a6ac4.shtml#topic6



Hope to help

Giuseppe

ahnyongsik Thu, 10/02/2008 - 03:20

Hi, Mr.Giuslar


Thank you for your reply.

your information is very useful to me.

And I asked about Software forced crash.

In this case, the most of issue came from over usage CPU on AAA ACCT.

Actually, we have a Three PDSN(7206 VXR) and they are using different IOS version.

This problem came from IOS version 12.3(14)YX4 and 12.3(11)YF4 but PDSN#3 did not display this error message even though PDSN#3 also is using IOS version 12.3(11)YF4.

Mr.Giuslar, I also tried to find suitable IOS version in your site but I could not find it. I knew that you already mentioned to me how to find fixed IOS but when I searched bug error message your web site did not have anything.

If you can recommend for solve this problem, Please let me know which IOS version is suitable to our PDSN systems.

And then I also attached PDSN#2's tech-support file because PDSN#2 also has at the same problem such as PDSN#1.

Thank you for your BEST support!




Giuseppe Larosa Thu, 10/02/2008 - 12:06

Hello Yongsik,

also the second router was reloaded by the watchdog process.


I think it is wise to open a service request to get info about what IOS code to use for upgrade directly from TAC.

I wasn't able to find an exact match for your issue but there are several bugs that contain CPUhog on their description.


If you have one router that doesn't experience the problem even if it has the same IOS image you should investigate what are the differences in configuration or in usage level between the two devices you could find a workaround from any config difference or an indication that the problem is triggered by the volume of connections served


Hope to help

Giuseppe



Actions

This Discussion