Possible bug - SG500 to SG500 - link fail - LAG fail - LACP fail.-Web Fail

Unanswered Question
Jul 26th, 2012

I just purchased two SG500-52 switches to replace two old SRW-2048's and have found two SERIOUS issues....

PROBLEM #1:

The old switches are running a static 4xGig. etherchannel between two areas of our building.  Having had experience with Catalyst 3650's, I just pulled the new 500's out of the box, assigned each a separate IP and updated the password via console.  Then logged in to each via the web panel, created port-channel 1 as gi1/1/1-gi1/1/4 and allowed all vlans.  Simple, no ?

Well, the two switches immediately went to high cpu utilization and after a few minutes, even the console port was unresponsive.  Power cycled the switches and they seemed to work, but traffice was very slow.  Plugging back in to the console, I found that the gig links were completely dropping the link lights and reseting in random order.  Of course, the po1 was adding and dropping the links constantly.  No individual link would stay up more than 30-45 seconds.

So - Reset switches back to factory - this time configure as a static LAG like the older SRW-2048's.  Same result.... the links won't stay up.

Next, reset to factory again, and just try a simple switch to switch link on a standard port.  STILL no go.

Next, replace one end with a Dlink 8port gig switch.  The link holds and traffic runs normally.

Now, up the anti..... Create port-channel2 as a 2xgig LACP link to a Dell R610 server running WIndows 2008-R2.  Works like a champ.

Finally - connect the old SRW-2048 (Unmodified from earlier, running config) via the 4xgig Static LAG.  It works like a champ as well!

Problem statement - the SG500 does not seem to be able to maintain an ethernet link (link LEDs completely go out) when connected to another out of the box SG500.  I tried the connection of gi1/1/1 and gi1/1/15.  I am NOT trying to stack the units. This seems independent of LAG and LACP.  If you replace EITHER end with any other switch over the same link then all is well.

Further, I tried disabling all smart-ports, auto macros, cdp, and as much intelligence as possible.  The links still cycle.  Incidentally, the links are 228 feet long and are certified.  Just for grins, I re-ran the link cert and the wire is fine.  Other switches are fine.  This is VERY frustrating.

PROBLEM #2:

Finally - These thing exhibit a problem similar to the SRW-2048's that they are replacing in that the web interface seems to die without logging or posting any errors.  The SRW's used to be terrible about it, but the last couple of firmware updates reduced the problem to one every few months.  So far, the SG500 that is mostly full (5 servers and 40 or so clients) has not lasted more than 24 hours.  The problem starts when I begin noticing some dropped packets and the 5 servers start logging errors about clients inexpectedly disconnecting.  After a while long, the switch will pass some ping packets, but virtually nothing larger than about 150 bytes or so.  Connecting to the switch via http yields a browser error about no response from host.  If I connect via console cable, there is no response via serial.

After power cycling the unit, there are no errors in the logfile other than normal messages.

FYI: The Linksys SRW-2048 switches have been doing the same job for 5 years.  After a couple years of crappy support from Linksys, there was finally a firmware update that fixed much of the memory leak problems in the management interface.  This SG500 costs four times as much and really smells like the original Linksys with a redesigned web interface!  I bought these because I was tired of dealing with shoddy hardware and convinced the money people that Cisco was worth it.  Our network has been down every morning for the last 10 days and I'm loosing patience quickly.

Hopefully, someone can help me out here ! 

BTW:  Trying googling for this problem and you will find that there are a number of posts about this on 200 and 300 series switches and they all indicate that the problem started last November (2011) or so with a firmware update.  Apparently, it was not an issue prior to that time.

I have this problem too.
0 votes
  • 1
  • 2
  • 3
  • 4
  • 5
Average Rating: 0 (0 ratings)
Tom Watts Thu, 07/26/2012 - 18:33

Hello Duane,

Please send me a PM or email towatts at cisco.com with your Cisco ID and the serial number of the SX500 switch. I will open a service request for you and we can check this out together.

-Tom

duanes1967 Thu, 08/02/2012 - 13:37

SOLUTION FOUND - work around.  Future versions may not have this problem.           

FYI: Running head to head SG-500-52-K9 switches between two parts of our building.

Both units are running SW version 1.2.0.97, boot version 1.2.0.12, HW version: V01.

I have found that this is NOT a LAG or LACP issue.  It is a physical link layer issue.

This problem has been traced to some sort of issue with the Energy Saving Feature negotiation.... (eee enable).  That is why it is only happening when the two SG-500's are interconnected via a realitively long cat-5e cable (180-190 feet).  The switches are being too agressive and drop the voltager too far resulting in a drop at the physical link layer.

This may be an issue between other green hardware with Link power management as I only have a limited number of PC's that support eee type features, but we have not seen it as of yet. I have not tried this with a longer cable, but the problem also does NOT occur on a 1 meter Cat-5e patch cable.  It may even be unique to my particular cable or run, but the problem does NOT occur if eee is disabled.  The cables are Cat-5e and the end to end links are professionally certified for gig-ethernet standards.... so, take this as you will.

Cisco support is looking at the issue more in depth, but the work around is simple.  If you have a problem with an interface that is unstable or repeatedly drops the link and it is not being caused by spanning tree or other issues (macros, etc), then disable eee either on the specific interface (or globally if you just don't trust it... but I have seen no reason to do this globally).

REMEDIAL INSTRUCTIONS.

So simple a Caveman could do it !

This can be done via Webconfig or by serial, telnet, ssh. (Command line rules !!)

Easy way - Connect to the switch via telnet or ssh.

To globally disable eee do this:

config t

no eee enable

end

To disable eee ONLY on a given interface (example port 32):

config t

int ge1/1/32

no eee enable

end

Finally, it you're happy, don't forget to save your changes like this:

write

y

Then just type "exit" to close the session.

Actions

Login or Register to take actions

This Discussion

Posted July 26, 2012 at 4:04 PM
Stats:
Replies:3 Avg. Rating:
Views:2662 Votes:0
Shares:0

Related Content

Discussions Leaderboard