ASK THE EXPERT - DATA CENTER TESTING AT CISCO

Unanswered Question
Nov 2nd, 2007

Welcome to the Cisco Networking Professionals Ask the Expert conversation. This is an opportunity to learn about the Data Center Assurance Program that puts Cisco's robust data center product offerings through the rigors of testing in a multi-platform, multi-vendor environment, with Cisco expert Steve Young. Steve has been with Cisco Systems for six years. He began his career with Cisco, doing systems testing on the Catalyst 5000 switches. He then transitioned to testing the Catalyst 6500 as a founding member of the Safe Harbor team. Now, a manager on the Safe Harbor team, Steve is currently overseeing the Data Center Assurance Program. He lives in RTP, NC with his wife and two daughters.

Remember to use the rating system to let Steve know if you have received an adequate response.

Steve might not be able to answer each question due to the volume expected during this event. Our moderators will post many of the unanswered questions in other discussion forums shortly after the event. This event lasts through November 16, 2007. Visit this forum often to view responses to your questions and the questions of other community members.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Overall Rating: 4.7 (3 ratings)
Loading.
francisco_1 Mon, 11/05/2007 - 06:08

Hi Steve,

Not sure if I'm suppose to ask this kind of question here but we have just built two new datacenters. In each datacenter we have 2 6509 multilayer switches fully redundant running sup 720 engines. Each 6509 have 2 sup 720 and the 6509 are connected locally via a 48 port line cards - 4GB ether-channels. We have 8 blade switches (WS-CBS3020) trunk (4GB channels) to the 6509's at each datacenter. Between the datacenters we have 2GB darkfiber terminating in to cwdm. the 6509's are connected directly into the cwdm through a line card 24 ports sfp. so each 6509 using 2 channels each on the cwdm at the datacenters. Also we are running dynamic routing between the dataenters - EIGRP. Now spanning tree is local to each datacenter since we have L3 connectivity between datacenters and also vtp.

My question is what is the most effective and easy way to test spanning-tree locally at each datacenter for loop free topology and also of course testing redundancy between the datacenters.

smyoung Tue, 11/06/2007 - 05:03

Hi Franco,

What distance is between your two data centers? What sort of latency does that impose?

In our testing, we test DC failover by breaking the links that connect the two DCs. In our case, there are several: 2 IP WAN links and two SAN fiber links. We do this while we have Oracle & Exchange application traffic running (generated via various test tools). We look for things such as how long it takes until our branch clients can successfully request a page from the secondary DC, or how long it takes them to get email access returned. On the storage side, we look at how the backup LUNs take over.

You can find details on our recent failover testing at http://www.cisco.com/go/dcap. Scroll down to find the DCAP 3.0 results section. I believe the failover testing is in Chapter 10.

As for testing spanning-tree, what we do in Safe Harbor and DCAP testing is to break each of the links in the L2 domain serially, each time checking to ensure STP has reconverged properly, according to the rules. Running traffic helps validate the convergence times. In Safe Harbor, we use pretty much all packet blaster traffic. In DCAP, we use some application traffic, and I would recommend this for really testing STP well. If there is an established TCP connection over one L2 path and a link is broken, it's important to measure how the application responds to the STP topology change. Does it cause the switches to get in some weird state?

If you're using STP enhanced features such as portfast or root guard, you may want to test those out as well, perhaps by sending a BPDU on a root-guarded port, or connecting a "rogue" switch with a low bridge priority into the network and see what happens. Is it isolated correctly?

Regards,

Steve

francisco_1 Tue, 11/06/2007 - 11:55

The distance is about 5 kilometers. You have given me ideas on how to test spanning-tree and redundancy plus i need to go through the DCAP 3.0 test plan and results to get more ideas - thanks

francisco_1 Tue, 11/06/2007 - 12:16

Also steve,

something else has stuck in my mind about the crossbar on the 6509 for ages. My knowlege is the 6509 comes with sup 720 - giving us 720Gpbs backplane bandwidth and 400Mpps for IPV4 (Good stuff). each cards will take about 40Gbps Now i believe that the 6509 also comes uses 256 crossbar switching fabric - the switching bus. now depends on the line cards we are using they are all fabric enable line cards and when i type switch#sh fabric switching-mode, the mode is crossbar for all line cards and the standby sup engine - the active sup engine is using dCEF (express forwarding). Does that mean that the line cards not using the 720Gbps backplane bandwidth on the sup 720 instead they are using 256 switching bus? I hope i am not confusing myself here.

Also the switch modes i know are Truncated, Compact Mode and Bus Modes - Never seen the crossbar!!! - maybe those modes are only available on the 4500's.

Thanks in advance.

Franco,

francisco_1 Thu, 11/08/2007 - 05:00

Thanks for the infomation - i have gone through it - all making sense now.

My apologies if i'm asking too many questions.

in relation to the information provided, is it possible to stop the sup 720 engine from processing L2 packets like broadcast? one of the problems at the datcenters is everytime we run broadcasts on the 6509 within a vlan, the cpu on the 6509 spikes to 99% i believe the problem is related to the sup 720 engine is processing the broadcast traffic at layer 2 and MSFC is doing later 3. Can i get the 6509 to handle broadcast in software instead of hardware?

Also we download the native ios image s72033-ipservicesk9_wan-mz.122-18.SXF10a.bin from cisco website - the image is only available as Early deployement release and Deferred Release. does the image avaliable in general deplayment?

Thanks..

Franco.

smyoung Thu, 11/08/2007 - 05:39

Too many questions? That's what this forum is here for ;)

I'd have to know more about your broadcast traffic. It almost sounds like you are purposefully using bcast to transmit data. Is this the case? If the bcast is L2 (MAC=all F's), then you will likely have the problems you are seeing. There are broadcast suppression features on the 6k, but if you're purposefully using bcast, you may not want to suppress it. I need a bit more info to answer your question better.

Regarding SXF10a...the original SXF10 release was deferred due to some link up/down SNMP trap issues. SXF10a has the fixes for these issues and is not deferred. Further, SXF10a is not a GD (General Deployment) release. There are no releases in the 12.2(18)SXF train that are GD.

GD is a label that is applied to software to imply a certain level of stability, based on a number of metrics including velocity of customer-found defects. I believe the last available GD release for the cat6k was 12.1(26)E-something-or-other. I don't know of any plans to give any 12.2(18)SXF releases a GD label.

You should note that 12.2(18)SXF releases for the cat6k are subjected to extensive customer-focused testing through various test organizations within Cisco (including Safe Harbor and DCAP). The latest Safe Harbor recommended release is 12.2(18)SXF9 and SXF11 testing is nearing completion (we did not validate SXF10a).

Regards,

Steve

francisco_1 Thu, 11/08/2007 - 06:38

Good stuff. I will upgrade to 12.2.18-SXF11 instead since Safe Harbour recommended it and you guys have tested it properly. I don't want to end up having issues in 6 months time because of ios bug so I need to install the reliable now at the datacenters.

the broadcast traffic is generated from an application we are testing tibo rv. what is does is generate lots of broadcast messages within the vlan.

smyoung Thu, 11/08/2007 - 06:47

I think I muddled my words a bit. I meant to communicate that the latest SH recommended release is 18SXF9. SXF11 testing is nearing completion, but SH won't call it recommended until final test has been executed and evaluated.

We are using Tibco RV in DCAP (just started). I'll check into whether we have the same experiences and get back to you.

Regards,

Steve

francisco_1 Thu, 11/08/2007 - 07:21

ok steve.

Thanks.

I will install the 8SXF9 or might just wait for SXF11 testing to complete.

Please let know what you find about tibo.

Franco.

smyoung Thu, 11/15/2007 - 17:48

Hi Franco,

I checked and we have not seen the type of broadcast behavior that you described to me with your use of Tibco. We have plenty of mcast, though. I'd like to hear more about what you're doing.

Not being totally familiar with the capabilities and functionality of the application, it would make more sense to chat offline about this.

I believe my email will show up in my profile. Feel free to unicast me and we can talk further.

Regards,

Steve

andrew.burns Tue, 11/06/2007 - 00:48

Hello,

In the DCAP program there is a recommended IOS version of 12.2(18)SXF6 for the 6500 platform. As this image is officially "deferred" is it still the recommended version?

Andrew.

smyoung Tue, 11/06/2007 - 04:45

Hi Andrew,

12.2(18)SXF6 is not deferred. That is a common conclusion when seeing the software advisory notice on CCO when trying to download the s/w. The advisory has begun to accompany download attempts on all new s/w, so that customers are aware of potential issues.

That said, 12.2(18)SXF7 is actually the latest DCAP-recommended NativeIOS version. You can find the latest results from DCAP 3.0 testing on the DCAP page:

http://www.cisco.com/go/dcap

These results will be available in our interactive tool within 2 weeks.

(NOTE: If you try to download 12.2(18)SXF7 from CCO, you will again run into a software advisory.)

Regards,

Steve

andrew.burns Tue, 11/06/2007 - 07:23

Hi Steve,

That's a very useful clarification - I've been assuming (incorrectly) that a "serious software issue" automatically meant deferment.

Another question for you - DCAP seems to be aimed mainly at new data center deployments, so what would you see as the greatest value of the program to existing, mature, data centers?

regards,

Andrew.

smyoung Tue, 11/06/2007 - 10:57

Hi Andrew,

While DCAP is not aimed directly at new DC deployments, I'd think that it would be very useful there. There's also a good deal of value (and potential value) for customers with established DCs.

Cisco has a pretty good set of data center "best practice" designs (http://www.cisco.com/go/srnd) that the DCAP program leverages. In talking to customers, I find that most existing DC deployments have many similarities. The DCAP test topology tries to cover these similarities and "fill the gaps" if there are any.

If you have an existing DC deployment, I'd say that the best way to leverage the DCAP program is to look through the results of our testing and our scope of coverage. If there are areas where the DCAP designs and your designs overlap, the testing in those areas might be very useful to you. If there are such similar areas and the testing is not to your satisfaction, please let us know where you'd like to see improvements. If there are no such similarities, we'd like to know where the differences are.

If you send a "subscribe" email to [email protected], we'll add you to our external alias where you can make such suggestions and ask other questions with regards to scope & coverage. Or you can send me an email directly. It should be listed under my profile here.

Regards,

SteveHi Andrew,

While DCAP is not aimed directly at new DC deployments, I'd think that it would be very useful there. There's also a good deal of value (and potential value) for customers with established DCs.

Cisco has a pretty good set of data center "best practice" designs (http://www.cisco.com/go/srnd) that the DCAP program leverages. In talking to customers, I find that most existing DC deployments have many similarities. The DCAP test topology tries to cover these similarities and "fill the gaps" if there are any.

If you have an existing DC deployment, I'd say that the best way to leverage the DCAP program is to look through the results of our testing and our scope of coverage. If there are areas where the DCAP designs and your designs overlap, the testing in those areas might be very useful to you. If there are such similar areas and the testing is not to your satisfaction, please let us know where you'd like to see improvements. If there are no such similarities, we'd like to know where the differences are.

If you send a "subscribe" email to [email protected], we'll add you to our external alias where you can make such suggestions and ask other questions with regards to scope & coverage. Or you can send me an email directly. It should be listed under my profile here.

Regards,

Steve

sbaddipu Fri, 11/09/2007 - 15:00

Hi Steve,

I have a question which I have been thinking about for a long time. It may not be an issue for me yet, but I may run into it at some point of time later.

For one of our accounts we have a two redundant CAT6k systems with FWSMs, CSMs and IDSMs, one in each (redundant). All is well now, but since the same infrastructure has been logically partitioned into layers, the packets go through the same line cards and service modules multiple times and take up the resources. I do not have a quantitative way to do capacity analysis. For example, if my CSM has 4G of max throughput, how much of it am I using now? You see, the connections are using the backplane. Same goes for FWSM.

Do you have any suggestions for me?

Thanks

Satya

smyoung Fri, 11/09/2007 - 18:17

Hi Satya,

Let me offer some suggestions off the top of my head, that I have used in testing in the past:

1) Use the 'show fabric utilization' command for fabric-enabled cards. You'll be able to see the %age used for certain cards. For instance, here's the output for one of our FWSMs, in slot 1:

dcb-ss-1#sh fabric utilization

slot channel speed Ingress % Egress %

1 0 8G 0 0

(of course, no traffic running on our test bed tonight...)

2) Use the "show interface [counters]" command repeatedly over timed intervals, counting the Bytes in and out. The in this case will be the Port-channel interface that is connecting the service module to the system. In the same test device, we've got a CSM in slot 2:

dcb-ss-1# show eth sum

Group Port-channel Protocol Ports

------+-------------+-----------+-----------------------------------------------

2 Po2(SU) LACP Te6/1(P) Te6/2(P)

3 Po3(SU) LACP Te6/3(P) Te6/4(P)

258 Po258(SU) - Gi2/1(P) Gi2/2(P) Gi2/3(P) Gi2/4(P)

270 Po270(SU) - Gi1/1(P) Gi1/2(P) Gi1/3(P) Gi1/4(P)

Gi1/5(P) Gi1/6(P)

So Po258 is the channel connecting the CSM. I can do a "show int po258 [counters]" to see the bit rate to and from the card.

Like I said, these are just a couple of ways I've measured this stuff myself in testing. I'll see if I can dig up any better ways.

Regards,

Steve

malkova Sat, 11/10/2007 - 20:35

Hello Steve,

With the above "show int po258" command, how do we set the load-interval to 30seconds instead of 5minutes?

We are trying to troubleshoot why the packets are getting lost(more than 50%) between output and input of fwsm crossbar channel. in our case it is po275,

sw05#sho eth sum | include SU

101 Po101(SU) PAgP Gi1/1(P) Gi2/1(P)

102 Po102(SU) PAgP Gi1/2(P) Gi2/2(P)

103 Po103(SU) PAgP Gi1/3(P) Gi2/3(P)

275 Po275(SU) - Gi6/1(P) Gi6/2(P) Gi6/3(P) Gi6/4(P)

sw05#

sw05#sho int po275 coun

Port InOctets InUcastPkts InMcastPkts InBcastPkts

Po275 15717460051 10974493 0 181

Port OutOctets OutUcastPkts OutMcastPkts OutBcastPkts

Po275 44324642882 56738802 422129 16548

sw05#sho firewall mod 6 traf | include rate

Queueing strategy: fifo

5 minute input rate 105820000 bits/sec, 8715 packets/sec

5 minute output rate 310765000 bits/sec, 49749 packets/sec

fwsm vlan interfaces have open acl access between each other. no deny logs on fwsm.

[The load test on different switch/fwsm (no connections to first at all) with the similar set of configuration gives no packet loss.]

How do we troubleshoot in deep where the packets are getting lost inside fwsm?

Thanks for your valuable help.

-malkova.

smyoung Tue, 11/13/2007 - 07:53

Hi Malkova,

Your question is outside the scope of this discussion, but I didn't want to leave you hanging, so I asked one of my co-workers. See below for his reply. You may want to try asking on the Firewalling forum, accessible from here:

http://forum.cisco.com/eforum/servlet/NetProf?page=main

Regards,

Steve

========================

I am not sure how to change the load interval for the internal FWSM port channel.

Troubleshooting packet loss IMO would be easiest on the FWSM itself. I am not sure what level they are debugging with. They said there are open ACLs between the two interfaces, but just to be sure, I would add

an explicit ip deny any any to the end of the ACL. When it is implicit and traffic is denied it isn't always logged (this has changed between releases and has been filed as a bug both ways and I do not know the current behavior). Since the implicit deny is there anyhow when it is made explicit the counter will rise if the seemingly open acl isn't

being hit.

Also if changes are made to an acl, but xlates already exist the xlates must go away before the changes are seem on the output. This can be done with a clear xlate to force a flush which is disruptive to all the traffic through the FWSM.

Finally ACLs are not enough to permit traffic through the FWSM depending on the mode. It is slightly different what need to be specified between transparent and routed, but in either case xlates need to be formed for traffic to pass.

In short if you want to inspect deep it should be done on the FWSM and not the switch itself.

justuniversity Mon, 11/12/2007 - 10:25

Hello,

I have the following 2 issues with my 6509E core switch and if possible I need a help in solving them:

1) I would like to limit the inbound traffic comming via a giga interface not to exceed 2 Mbps, is the command "bandwidth 2000" will be enough or I need to do something else?

2) I would like to make mirrorring for 10/100/1000 utp trunk ports on another 10/100/1000 utp port and connect IDS sensor but I dont know how to do so?

Thank you in advance

smyoung Mon, 11/12/2007 - 11:31

Hi,

I have not personally messed around with the bandwidth command in IOS to see that it will actually restrict available bandwidth. I have, however, messed a bit with QoS and offer the following rate limiter solution:

!

policy-map 2MB-POLICER

description 2 Mbs policer

class RATE_LIMITING

police cir 2048000 bc 2048000 be 2048000 conform-action transmit exceed-action drop violate-action drop

!

class-map match-all RATE_LIMITING

description dedicated customer default class-map

match access-group name MATCH-EVERYTHING

!

interface

service-policy input 2MB-POLICER

!

ip access-list extended MATCH-EVERYTHING

permit ip any any

!

As for your second question, have you looked at the IDS configuration guides?

http://www.cisco.com/en/US/partner/products/hw/vpndevc/ps4077/products_installation_and_configuration_guides_list.html

Regards,

Steve

justuniversity Mon, 11/12/2007 - 11:45

hello

thank you for your help, but I would like to make sure if this will limit all the traffic in and out the interface to be less than or equal 2Mbps.

regarding the second issue, I dont have to do anything with the IDS, I am asking "how to make port mirroring for a 10/100/1000 utp trunk on another port?"

Thank you again for your prompt response

yours

smyoung Mon, 11/12/2007 - 12:22

Hi,

You initially asked for a way to limit the inbound traffic to 2 Mbps. If you want to limit the outbound traffic as well, just augment the interface config to include an output service-policy (the same one) as well.

If you look in the IDS config guide I sent, there is text in there on how to configure port mirroring (or SPAN on the c6k) to accomodate deployment of IDS in promiscuous mode.

More specifically:

http://cisco.com/en/US/partner/products/hw/vpndevc/ps4077/products_configuration_guide_chapter09186a00807517eb.html#wp1030752

Being a Cat6k guy, I'm answering from that perspective. This will be applicable for both integrated sevice module (IDSM) and the appliance (IDS). If you're thinking of "port mirroring" on a diff product, or one not covered by the SPAN feature, let me know and I'll dig some more.

Regards,

Steve

andrew.burns Wed, 11/14/2007 - 01:23

Hi,

In the DCAP program do you use any modelling tools for validation (e.g. Cisco's Network Planning Solution) or is it mainly lab-based with customer feedback?

Andrew.

smyoung Wed, 11/14/2007 - 06:04

Hi Andrew,

We do not currently use any modeling tools for either design or management of the test topology. Is this (Cisco NPS) something that you would find useful to have run against the DCAP topology?

The basis for the designs we test come from Cisco's Enterprise Solution Engineering (ESE) data center team. They develop design guides for the data center which we leverage in DCAP. Of course, our customers may not be following these DGs explicitly which is why we work with them to see how their real-world deployments differ.

More on ESE design guides here:

http://www.cisco.com/go/srnd

Regards,

Steve

andrew.burns Fri, 11/16/2007 - 00:27

Hi Steve,

We follow the SRND's as far as we can but as they don't follow our DC architecture their use is often limited and the same appears to apply to DCAP as well.

We frequently run into issues with validation and initially I thought DCAP would possibly help with this but I've been looking into NPS it looks like this might be more appropriate.

There's obviously a clear cost attached to NPS - is there a cost involved with DCAP too?

regards,

Andrew.

smyoung Fri, 11/16/2007 - 03:54

Hi Andrew,

There is no cost to the DCAP program.

DCAP doesn't follow the SRNDs religiously. We temper them with information from the field where it makes sense.

What sort of issues do you run into with validation? If the issues have to do with software, there may be a lot we can do to help there. We often have customers share their configurations and designs with us, and even their internal test plans. If there are overlaps, we can provide coverage to help with validation.

If you'd like to have a quick chat about going this route, please drop me an email (address in profile).

Regards,

Steve

Actions

This Discussion