Nexus 7000 - Fabric Failure and VOQ

mlouis · ‎09-23-2008

I have been doing some research on the Nexus 7k and from what i am reading the following occurs:

1. Fabric Module Failure - Causes all traffic sent across that fabric modules crossbar to be lost

2. VOQ - protects against lack of buffer availability on the egress interface

Neither of these provide reliable transmission over the crossbar or acknowledgement of data crossing the crossbar fabric.

So my question is, if i have storage traffic (unicast based FCIP) that is crossing the fabric when a fabric module fails, is my understanding correct, that those frames are lost on the portion of the fabric that is controlled by the failed fabric module?

Even though the main fabric itself is intact for other traffic, this still means that I have loss in what is supposed to be a system built for zero-loss to support storage traffic.

Am i way off here or is this accurate.

Thanks.

inch · ‎09-23-2008

G'day,

I'm tipping it would work in the same way as the Cisco MDS' which is a 100% loss less switch.

Fabric modules (or sup's on 9506/9509) on the MDS' are dual redundant with a 1:1 ratio for each line card.

So every line card is connected to each xbar via two channels (one active and one standby) if one xbar fails the standby becomes active as to not reduce throughput.

Now, This is for the MDS' but i'm guessing all Cisco xbar based switches would be this redundant :)

mlouis · ‎09-23-2008

Thanks for the response. From what i have read the control plane and data plane are completely isolated in the nexus 7k. The supervisor modules control the control plane and the central arbiter and the fabric modules handle the VOQ and the xbar communication.

It works like this as i understand it:

1. packet arrives at the ingress of a line card and is passed on the port asic

2. port asic does its thing and forwards the packet to the replication engine

3. rep engine passes the packet onto the L2 and L3 Forwarding engines - they do their dance and pass the packet on to the fabric engine

4. Fabric Engine and VOQ mgmr consults the central arbiters to get credits to send traffic on the fabric

5. Central Arbiter checks the egress line card to ensure buffer space is available. If its available it grants credit to the fabric engine and VOQ engine to send the packet on the fabric.

The fabric crossbar is BW is determined by the amount of fabric modules installed - 1 FM = 23Gbs x 2. When 2 or more FM are installed to create more Fabric BW, they forwarding across the fabric for unicast traffic acts like a Etherchannel and performs some sort of hashing algorithm to send the packet across the fabric.

Lets say you have a 9216Byte packet and 3 Fabric modules installed. From what i am reading the packet would be broken up into 4 packets, around 2304 Bytes each (i think they might be 2460 can't recall), and passed across the fabric.

So you have 1 large packet, fragmented across the fabric cards, sent to the destination IO card.

While in transit, lets say one of the fabric Modules in the LB group dies. my understanding is the traffic on the trace goes with it.

The traffic is lost in this case since there is no acknowledgement of traffic sent across the fabric. I would think in a high bandwidth situtation this could be a lot of traffic, considering the speeds we are talking about here.

Is this a possibility or am i missing some redundancy here that will protect the traffic that would be lossed crossing the fabric?

Is this the case on the 65k as well for traffic crossing the fabric?

Thanks in advance.

Mike

inch · ‎09-23-2008

Taking it right from the manual....

http://cisco.com/en/US/customer/prod/collateral/switches/ps9441/ps9402/ps9512/Data_Sheet_C78-437760.html

"The fabric delivers up to 10 channels per I/O module and five channels per supervisor module for a scalable capacity of more than 4.1 Tbps for forwarding performance that can be increased as your needs grow."

It would seem the 10x channels per I/O module lines up to the 1:1 based on the 5x fabric modules installed.

With regards to loss of data in the event of a failure it would seem the key ingredient is a credit based system which would not accept that a frame has been received until it reaches the destination.

It would be good to hear something from the guys at Cisco to clarify this.

Cheers

Andrew

mlouis · ‎09-24-2008

Andrew,

I get that part clearly, but what happens to the data in transit if one of those 10 channels fails while data is still on that channel? With no acknowledgement on the fabric that the data has been received or lost, there is no way to retransmit the lost data across the fabric. So its lost.

Thats the root of my question. Is that a correct statement and what is the impact to this situation in a zero-loss storage environment? Will the application retransmit?

Thanks

Mike

inch · ‎09-24-2008

Howdy,

In the MDS' and I am assuming this gear as well the frame is sent over each fabric - if one goes down when a frame is in transit the other fabric has the same frame so it does not matter.

Now, that is in the MDS' but I am sure the n7k will _have_ to do something to the same to ensure it is loss less.....

http://www.cisco.com/en/US/prod/collateral/modules/ps5991/prod_white_paper0900aecd8044c7e3.html

Anyone from Cisco want to confirm? :)

Cheers

Andrew