cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
558
Views
0
Helpful
3
Replies

Improving ASR9k resilience - Looking for feedback.

Bryan Garland
Cisco Employee
Cisco Employee

Asking all our customers.  

 

Looking for some other examples where we had an issues on the a9k where we might have some gap in system resiliency.  Looking for issues where we might have seen a HW or SW issue and the router didn't handle it in a way you would have preferred to allow network redundancy to take over.  Please provide as much information about the situation as you can.  As well as what behavior you believe should take place?  If you have SRs or bug IDs please include as well. 

Here are some examples of the past that we have already addressed. 

CSCuc04493 - Disable LC interfaces if online-diags reports datapath error

This allows us to shutdown ports where datapath errors occur to allow network redundancy to kick in.

CSCun00493 - Need recovery mechanism for Punt/FPGA CRC errors in RSP440 

This has the RSP perform a failover or reload when it loses communication to the fabric.  

 

Thanks,

Bryan Garland 

Bryan Garland  CCIE#1942
Technical Leader, Engineering
HERO BU- Deployment & Escalation

3 Replies 3

Hi Bryan,

 

I see the NP performance as an area for improvements. There are certain NP lock conditions where automatic action is taken to recover from this situation, but there is no such thing when it comes to an NP overload scenario. There are certain NP counters which indicate an NP performance overload, but it is cumbersome for the customers to monitor these values. So it would be nice to at least have logging entries if the NP is overloaded or the ability to have actions taken. A typical example would be netflow, which is intensive for the NP because it needs to create frame copies for netflow. There is a punt policer that protects the line card CPU, but with very low sampling rates and small packets at a high rate the NP might get overloaded, leading to intermitting packet loss (rare but possible).

 

Cheers,

Florian

Florian,

 

Thanks for the feedback.  This is indeed an area that we can probably do some work. 

 

Thanks,

Bryan Garland 

Bryan Garland  CCIE#1942
Technical Leader, Engineering
HERO BU- Deployment & Escalation

 

Bryan Garland
Cisco Employee
Cisco Employee

Any other feedback?  

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: