I have been doing some research on the Nexus 7k and from what i am reading the following occurs:
1. Fabric Module Failure - Causes all traffic sent across that fabric modules crossbar to be lost
2. VOQ - protects against lack of buffer availability on the egress interface
Neither of these provide reliable transmission over the crossbar or acknowledgement of data crossing the crossbar fabric.
So my question is, if i have storage traffic (unicast based FCIP) that is crossing the fabric when a fabric module fails, is my understanding correct, that those frames are lost on the portion of the fabric that is controlled by the failed fabric module?
Even though the main fabric itself is intact for other traffic, this still means that I have loss in what is supposed to be a system built for zero-loss to support storage traffic.
Thanks for the response. From what i have read the control plane and data plane are completely isolated in the nexus 7k. The supervisor modules control the control plane and the central arbiter and the fabric modules handle the VOQ and the xbar communication.
It works like this as i understand it:
1. packet arrives at the ingress of a line card and is passed on the port asic
2. port asic does its thing and forwards the packet to the replication engine
3. rep engine passes the packet onto the L2 and L3 Forwarding engines - they do their dance and pass the packet on to the fabric engine
4. Fabric Engine and VOQ mgmr consults the central arbiters to get credits to send traffic on the fabric
5. Central Arbiter checks the egress line card to ensure buffer space is available. If its available it grants credit to the fabric engine and VOQ engine to send the packet on the fabric.
The fabric crossbar is BW is determined by the amount of fabric modules installed - 1 FM = 23Gbs x 2. When 2 or more FM are installed to create more Fabric BW, they forwarding across the fabric for unicast traffic acts like a Etherchannel and performs some sort of hashing algorithm to send the packet across the fabric.
Lets say you have a 9216Byte packet and 3 Fabric modules installed. From what i am reading the packet would be broken up into 4 packets, around 2304 Bytes each (i think they might be 2460 can't recall), and passed across the fabric.
So you have 1 large packet, fragmented across the fabric cards, sent to the destination IO card.
While in transit, lets say one of the fabric Modules in the LB group dies. my understanding is the traffic on the trace goes with it.
The traffic is lost in this case since there is no acknowledgement of traffic sent across the fabric. I would think in a high bandwidth situtation this could be a lot of traffic, considering the speeds we are talking about here.
Is this a possibility or am i missing some redundancy here that will protect the traffic that would be lossed crossing the fabric?
Is this the case on the 65k as well for traffic crossing the fabric?
"The fabric delivers up to 10 channels per I/O module and five channels per supervisor module for a scalable capacity of more than 4.1 Tbps for forwarding performance that can be increased as your needs grow."
It would seem the 10x channels per I/O module lines up to the 1:1 based on the 5x fabric modules installed.
With regards to loss of data in the event of a failure it would seem the key ingredient is a credit based system which would not accept that a frame has been received until it reaches the destination.
It would be good to hear something from the guys at Cisco to clarify this.
I get that part clearly, but what happens to the data in transit if one of those 10 channels fails while data is still on that channel? With no acknowledgement on the fabric that the data has been received or lost, there is no way to retransmit the lost data across the fabric. So its lost.
Thats the root of my question. Is that a correct statement and what is the impact to this situation in a zero-loss storage environment? Will the application retransmit?
Introduction This article will help you understand the steps on how to
download the UCS licenses from the Cisco Systems website and then
installing it on the UCS. The redacted (blue lines) just covers up
certain numbers for privacy please do not take them...
Introduction This article will help you understand and educate the
customer on how to clear their "expired licenses"
(license-graceperiod-expired) from their UCS-M. If a customer just
purchased a license and needs a step by step guide on how to download
Introduction Prepositioning is a powerful tools on the WAAS platform but
it is not always easy to figure out why your jobs are failing when
trying to retrieve the files.Here is a method that should help you to
figure out the reason why they are not succes...