cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
589
Views
0
Helpful
6
Replies

CEF causing Memory Fragmentation?

bsomali
Level 1
Level 1

Recently our Cat3550 got error saying CEF was disabled due to low memory condition.

TAC diagnosed it was a hardware failure and hardware was replaced.

After a while, new symptoms happened and reported MALLOCFAIL (Memory Fragmentation and No Alternate Pool).

This time TAC suggested to upgrade the IOS to 12.1(19).

We did but the same problem persisted.

I suspected something wrong with CEF that took so much memory out for the static routes on one of Vlans we have configured.

I copied "sh adjacency summary" from Cat3550:

Adjacency Table has 1526 adjacencies

Table epoch: 0 (1526 entries at this epoch)

Interface Adjacency Count

Vlan1 94

Vlan109 1422

Vlan254 3

Vlan252 3

Vlan131 4

That is the current situation that works normal without problem.

Before, we had a Class B static route via Vlan109.

I didn't have the chance to check the adjacency list, but "sh ip cef" listed all the hosts from the Class B subnet (10.1.0.0-10.1.150.0, could be more but ran out memory)

When the error happened, there were a lot of dead, fragment and coalesced listed in "sh mem summary".

Could it be more memory required or software bugs?

Appreciate any feedback.

Thanks.

6 Replies 6

ruwhite
Level 7
Level 7

Don't put a class B static pointing to a broadcast interface--use a next hop instead. The problem is, most likely, that the router is arping for every possible address within the class B range, and then building a seperate adjacency table entry for each one, which is going to suck up a _lot_ of memory, enough to crash the router, even (I've seen it happen many times).

:-)

Russ.W

You were right, Russ.

I did "sh arp" and the most lists were IPs from Vlan109.

I will change the static routes as suggested and see the result again.

Another question, if the Cat3550 received continuous topology changes, will it such up the memory as well?

Thanks.

What routing protocol is this? If it's EIGRP (you said topology, which implies it is), take look at show ip eigrp events, and see where most of the topology changes are coming from. Are these topology changes things that you are expecting, or are they something that seems like a problem? If they seem like a problem let's look at them, and see if we can isolate where they are coming from, and figure out some way to reduce the load on this router.

Summarization and route filters are the two best techniques to reduce the scope of topology changes in a network.

:-)

Russ.W

Hi Russ,

A quick update on the first issue.

After the change, "sh arp" and "sh adjacency" looks OK now.

I hope I got the problem resolved now.

On topology changes issue, it was related to Spanning Tree Protocol.

One of the Cat3550 interface seemed to be flooded with "STP: VLAN0001 Topology Change rcvd on Gi0/2".

How to investigate that?

Will that cause the "out of memory" condition?

Why were there so many fragment/coalesced/dead (free block) shown on "sh mem sum"?

Many thanks.

I would say that you should investigate the spanning tree issues by trying to determine what links on the spanning tree might be changing status on a regular basis. I would think that you could do some sort of debug that would tell you what spanning tree change took place, which would lead you to the source of the changes, but I'll leave a more detailed answer to those who know spanning tree better than I do.

On the memory issues, fragmentation is generally caused by the router using most of its memory while doing a lot of alloc's and free's while in a very low memory condition. When this happens, the outcome is, sometimes, severe memory fragmentation, evidenced by a very small block in the largest column of show memory, with a much larger number in the free column. The only real way to recover from this is a reload, and then working on things to prevent it from hapenning again.

As for dead memory, that's a little trickier. Suppose a process allocates some memory, then terminates without freeing that memory. This would be considered "dead" memory. Note that dead memory is most often actually in use by a some process other than the one that allocated the memory. The reason it's called "dead" is because IOS doesn't know what processes are actually using the memory, but the process that allocated it is dead. Generally, dead memory doesn't represent any sort of memory leak, it's just part and parcel of the normal operation of IOS.

:-)

Russ.W

Hi Russ,

Sorry, I've been away for few days.

Thanks for the memory explanation.

Appreciate if you or anybody else can advise further on how to debug the spanning tree issue.

Regards,

Benny