C3750G memory leak(ing)

Jul 8th, 2009
I have a campus network made up of 4 buildings which are connected in a ring. The buildings are connected by fiber and the topology is as shown in the attachment campuslayout.jpg

There are multiple fiber connections between buildings and originally I had 2 fiber lines between each switch (building) where one set of VLANs were allowed on one fiber line and the remainig VLANs allowed on the other fiber line. This worked as expected and Spanning tree had 2 of the links between one set of buildings as disabled. I recently (about 2 months ago) changed the links between the buildings to use etherchannel in LACP mode , but the unexpected outcome of that was that all links of the ring were active. I was sure that there was a problem with this but I couldn't find anything complaining about it in performance or logs. About 30 days after that initial change, I was unable to manage one of the 3750 stacks. It was passing traffic fine and responding to a ping, but it would not support a web, ssh or telent session. I hooked up the console cable and got a message from the switch that memory was low, please try again later (and the session would wait and then display that message again). Later that week, the other 3750 stack exhibited the same behavior and was unable to be managed and displayed the same message. The 3560 switches never had any problem with being managed during that time. I rebooted the 3750 switches and updated the fw hoping that that would address this oddity, but earlier this week (~30 days from when they were rebooted last time) I have the same symptoms.

So, is there a concept that I am confusing here? Can you not expect to do etherchannel LACP and STP? Does anyone have any suggestions on how I might get this to work utilizing both technologies?



Giuseppe Larosa Fri, 07/10/2009 - 06:31
Hello Brian,

until LACP bundles are going between the same pair of stacks you are fine.

I think you have configured well LACP and STP and you are hitting some SW bug that is still present in new IOS image.

We had a problem on a stack that after some hours of uptime was unable to use TCP sessions so it was able to answer to SNMP queries but it was not possible to telnet to it.

There was no memory issue in our case TCP sessions were lost inside the stack.

At the end we decided to divide the member switches and to run them as standalone because we realized that being used for access layer there was no advantage on the stack.

Hope to help



