cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
491
Views
0
Helpful
5
Replies

Flaky 2950 switches

gkuzmowycz
Level 1
Level 1

We have a regional office with two WS-C2950G-48-EI switches, both close to fully populated. Periodically, at a frequency of 1-2 weeks, all workstations in one or the other switch (but not both at the same time) will lose all connectivity to the network. There are no obvious problems with the switch, which remains powered on. The only (easy) way to re-establish connectivity is to power-cycle the switch.

This location is on a serious UPS system, and we think it's not power-related, since the 2621 router that's in the same rack and on the same circuit has been up with no problems for about 2 1/2 years. We've had a consulting electrician look at grounding and phase problems, and he found none. We thought of a possible problem in the Cat5 plant, but the problem occurs randomly in one or the other switch, not consistently in one of them.

Given that this is a remote location with no qualified IT staff, can anyone offer a suggestion for next troubleshooting steps?

5 Replies 5

thomuff
Level 3
Level 3

Do you have a monitoring solution. Are you running a netflow analyzer. The reason I ask is I am wondering if you can ping the switch from another location during the issue? Can you ping from the router? . Are you taking any errors on your ethernet interface on your router. Is it locked down to 100 Full or are your running auto negotiate. Do you have your logs configured for debugging? Is there anything showing up in the logs when this happens? Do you have time to console into the switch when this happens? Out of band access? Do you have visitors coming in? Does anyone have a hub or another switch linked to this switch. routing loops?

I would definitely setup some sort of out of band access, that way you can look at the switch when the problem occurs.

Checking the logs on the switch and the router usually point out the problem Try pinging the workstations. Try to keep from rebooting the switches until you figure out the problem. Also, Can you still connect to the router when the problem happens, If so, another option would be connect the aux port of the router to the console port of one of the switches. What IOS are you running?

Sorry about the brain dump, just typing my thoughts

Hope this helps

rgnwcco
Level 1
Level 1

I used to have exactly similiar problem in at least 3 3500-XL switches. Raised a call with Cisco TAC who recommended replacing the switches. Replaced the switches with new ones as part of Smartnet Advanced replacement and problems never came back.

I am currently going through a similiar cycle with 2 of my 2950 switches, but limited to the Gigabit ethernet ports only. They periodically lose connectivity. Only a reset can fix the problem. Replacing a switch with my spare has so far given stable results. Trying to figure out the differences between the 2 boxes.

Hello,

in addition to the other posts, you might want to check the logs of the switch for the following entries:

SCHAN ERROR INTR: SRC=6 DST=5 OPCODE=20 ERRCODE=5

If you see those entries, the following bug might apply:

CSCdv83336 Bug Details

Under certain level of traffic load, the (2950) switch will start logging the

following messages on the console:

SCHAN ERROR INTR: SRC=6 DST=5 OPCODE=20 ERRCODE=5

and after a few seconds, the switch will stop passing any traffic. In some cases, the switch seemed still forwarding broadcast and multicast traffic, which will cause STP problem if the switch has redundant link and is not supposed to be the root for the VLAN, as both port will go forwarding.

The same error message has been identified in CSCdu87836.

An assessment of the impact

Unit stops passing any traffic.

WORKAROUND

Several units were returned by CISCO. The units were re-screened to the latest test program, and failed the SDRAM memory test.

Customer should RMA unit back to Cisco.

HTH,

GNT

Thanks to all for your responses. I actually hadn't checked this forum in a while, since I knew my description was vague and wouldn't generate a lot of responses. But one of the early ones did cause me to start capturing the logs, and it finally bore fruit when one of the switches hung over the weekend, with the following messages:

*****

5 2006/05/29 04:28:56.042 EDT nnn.nnn.nnn.252 2288: May 29 04:28:54.999: %SYS-2-MALLOCFAIL: Memory allocation of 21268 bytes failed from 0x8053E128, alignment 0

6 2006/05/29 04:28:56.042 EDT nnn.nnn.nnn.252 2289: Pool: Processor Free: 1064 Cause: Not enough free memory

7 2006/05/29 04:28:56.042 EDT nnn.nnn.nnn.252 2290: Alternate Pool: I/O Free: 535340 Cause: Memory fragmentation

8 2006/05/29 04:28:56.057 EDT nnn.nnn.nnn.252 2291:

9 2006/05/29 04:28:56.057 EDT nnn.nnn.nnn.252 2292: -Process= "Cluster Base", ipl= 0, pid= 53

10 2006/05/29 04:28:56.057 EDT nnn.nnn.nnn.252 2293: -Traceback= 801CB320 801CD320 8053E130 8053DC5C 8053C128 801C51CC 801C51B8

*****

Could you rate the replies please!

I am glad you were able to figure it out.Hope you have Smartnet to RMA that bad switch.

Thanks

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:

Innovations in Cisco Full Stack Observability - A new webinar from Cisco