cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
1678
Views
0
Helpful
3
Replies

3750-X stack stops forwarding traffic in iSCSI SAN

MicronasPICA
Level 1
Level 1

I have 4 x Cisco 3750X-48-T-L switches (2 Stacks).

These stacks are used in a iSCSI SAN:

- IOS is 12.2(55)SE5 (ipbasek9)

- only one VLAN

- standalone iSCSI SAN

- 5 x DELL Equallogic storage boxes per stack

- Hyper-V hosts

- VMware ESX hosts

From time to time it seems that the switch stops forwarding traffic for a few seconds.

We opened up a case with DELL and they said, that the Equallogic tries to ping something (different hosts/Equallogics) on the iSCSI network and doesn't get a reply. It runs into a timeout and all ISCSI initiators loose their connections and have to login again.

The switch configuration was done according to a Best Practises Guide from DELL (http://en.community.dell.com/dell-groups/dtcmedia/m/mediagallery/20087983.aspx)

I modified the QoS config for better performance:

Queueset: 1

Queue     :       1       2       3       4

----------------------------------------------

buffers   :       1      95       1       3

threshold1:     100    3200     100     100

threshold2:     100     100     100     100

reserved  :      10      90      10      90

maximum   :     400    3200     400     400

Default port config:

interface GigabitEthernet1/0/1

description -

switchport access vlan 5

switchport mode access

shutdown

speed 1000

duplex full

flowcontrol receive desired

spanning-tree portfast

After some troubleshooting, which leads to the QoS changes, I rebooted one of the stacks and the problem disappeared (for now).

3 Replies 3

Reza Sharifi
Hall of Fame
Hall of Fame

Is flow control enabled on the server?

If not, you may want to delete this command on the switch and test again.

no flowcontrol receive desired

HTH

Flowcontrol is enabled on the server and active (Oper-Status ON).
However, It shouldn't be needed since the switch has only 1g i/fs and the servers should be able to handle this amount of traffic easily.

jsurak
Level 1
Level 1

This leads me to believe that you have a misconfiguration somewhere, in either the physical layer, or host iSCSI timeouts, switch configuration or older Array Firmware.

Just to clarify one point; It’s a bit more the just “tries to ping” the host, what happens is an iSCSI redirect:

When the host iSCSI initiator connects to the EqualLogic Group you specify the Group IP address as the target portal in the initiator (which is only a virtual IP, this is also known as the WKA – Well Known Address).

When the array group receive a target connection request from the host, the host is redirected to at least one* physical eth interfaces in the group (this would be an array that actually contains the data the host is requesting) (*if using MPIO, for each interface setup for MPIO).

Troubleshooting:

To isolate your problem, first ensure you have the correct host timeout set (this can be downloaded from the support site (eqlsupport.dell.com), and on the FW download page, look for the document titled “iSCSI Initiator and Operating System Considerations “. Note that you need to ensure the setting not only on the ESX or Hyper-V host, but also any VM running that matches the configurations listed).

Next you need to ensure that from each host, you can ping and traceroute to every eth interface on every array(i.e., ping –S host_interface eth_interface – windows example((ESX use –I [capital eye]). You must do this for all combination, so if the host has 3 interfaces and your 5 arrays have 3 interfaces you need to test all possible combinations (15 different ways).

Then test from each array to every host (telnet to the array using eth0 IP, and ping each host interface through each array eth on the member, so for example ping –I eth0_ipaddr host_nic1_addr, repeat this for every possible combinations (-I [is capital eye])

Finally you need to test ping from the switch to every possible IP.   Once you have all possible combination working, you need to fail over the arrays to ensure you can still ping all combinations.

(the ping testing is explained in greater detail on the support site, go to the KB and search for “Ping from the PS array using a specific ETH port interface”.   The support site is eqlsupport.dell.com (login required).

Ensue you follow this portal for host and other configuration guidelines: http://en.community.dell.com/techcenter/storage/w/wiki/2632.storage-infrastructure-and-solutions-team-publications.aspx

Pay particular attention to the Configuration Guide and the Rapid EqualLogic Configuration Portal links.

(BTW, EqualLogic recommends flow control enabled – you may want to set this on the hosts and switch to rx and tx, however “typically” Flow control would not cause any disconnection issue, just performance related issues)

Another item of interest is that the EqualLogic compatibility matrix lists the tested IOS for the 3750x as 12.2(35)SE51 (this is on the link listed above), you may want to try to downgrade (yuck, I hate downgrades, but if everything else pans out, you may need to try this).

Lastly, you should consider updating your firmware to the latest version (either 5.x or 6.x).

-joe

Review Cisco Networking products for a $25 gift card