We have Cisco 2821 router at one of our premier customers. Couple of days back the router got rebooted and upon investigation we found that due to some bug the router got rebooted. The show version is as below
uptime is 23 hours, 11 minutes
System returned to ROM by bus error at PC 0x4038F110, address 0x2FFFFFF at 12:03:29 GMT Tue Dec 22 2009
System restarted at 12:04:40 GMT Tue Dec 22 2009.
Upon further diagnosis on cisco site we came to know that ther is Cisco bug attached to such failures.
The stack trace decoded symbols are:
Possible bug matches are listed below. Bugs with a score of .90 or more
are the most likely candidates:
|0.97||CSCsi71675||D||CSCsk04674||ISR Crashed on configuring QoS in System Test scenario|
Can any one help us here to identify the work around for the said bug as we need to resolve the issue on priority to avoid impact customer business. The vendor FE is at the site and waiting for our instructuctions.
Your quick response is appreciated
It is not said there is a work around.
If it happened just once, monitor.
If it happens frequently, upgrade or change IOS.
If it keeps happening, replace router.
Please do not solicit urgency as this is a freely contributed forum. For guaranteed support, use the TAC:;
Thanks a lot for the quick response.
Sorry for soliciting on the urgency and will ensure that it wont happen in future.
Can you please suggest any specific IOS or upgradataion from current will work?
Agree with Paolo. The bug you posted doesn't show a version in which it is fixed. Also, you didn't tell us if the description of the bug (ISR crashed on configuring QoS) makes sense in your environment. Was anyone configuring QoS when the router crashed? (This sounds strange to me, because if I was configuring something and router crashed right away, I would know what I was doing before the router rebooted and would try to avoid it for the time being). If this was so, maybe you could just avoid touching the router for a while until you receive a more definitive answer. I know this doesn't sound like the best workaround one ever came up with, but might do for urgent case during holidays. If the description doesn't make any sense, then try to see if there is any other bug that matches more your case and if resolution or workarounds exist for that.
Where did you decode the tracebacks? Please upload the original tracebacks
From the decodes you've provided, it seems like CSCsg90243, resolved in 12.4(15)T
thanks for the inputs. I will get the original trace backs and udpate you. However just want to make note here that this the second instance where router was rebooted with similar error.
Cisco IOS Software, 2800 Software (C2800NM-ADVIPSERVICESK9-M), Version 12.4(11)T, RELEASE SOFTWARE (fc2)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2006 by Cisco Systems, Inc.
Compiled Sat 18-Nov-06 17:16 by prod_rel_team
ROM: System Bootstrap, Version 12.4(13r)T, RELEASE SOFTWARE (fc1)
xxxxxxx uptime is 1 hour, 12 minutes>>>>>>>>>>>>>>>>>
System returned to ROM by bus error at PC 0x4038F110, address 0x3FFFFFF at 14:58:10 GMT Mon Jul 13 2009 >>>>>>>>>>>>>>>>>>>
System restarted at 14:59:22 GMT Mon Jul 13 2009
System image file is "flash:c2800nm-advipservicesk9-mz.124-11.T.bin"
At that time we had replaced the router. But now the problem recurred.
I will come up with additional inputs to help in investigations.
Bus errors are mostly due to bugs. So if you're running the same IOS and similar configuration, chances are that the router will crash again
The tracebacks you decoded seem to be correct, I got the same results.
Seems like CSCsg90243, resolved in 12.4(15)T
Seems like something related to QoS. Do you have QoS or IPSEC configured?
12.4(11)T is such an old image, likely riddled with bugs.
Whoever suggested to replace the router instead, is not a professional of this industry.
You typically speak the tough, unbearable truth. Such things might happen because customer pressure can be even more unbearable. Customers sometimes want the engineer to take the most extreme and costly measures to be satisfied that they receive the best possible treatment. Engineers on the other hand sometimes feel they have to do something, anything, to show to the customer they are actively trying to resolve the issue. Also, in this case you have to admit that the term 'bus error' can sound to non-programmers a lot like being most possibly hardware related.
I am not saying that we should. I was implicitly describing some 'feelings' that we need to fight against. In any case, the doctor metaphor for this profession causes a lot of stress to people, sometimes for no reason. Unless of course the customer in this case is some doctor performing remote surgery by sending traffic through this router! Good luck to the patient in this case!
The below givem link might help you to sort out the issue.
thanks a ton to all your inputs and suggestions. We have upgraded the IOS and currently the router is under monitoring.
I will come back in case of further issues.