Re: 11503, ssl, VIP redundancy, high cpu

robersona · ‎12-05-2005

Hi, I have redundant 11503s (sg0730106 standard feature) configured with VIP redundancy, accepting incoming SSL requests, then talking to backend ssl servers. The CSS are deployed in a "one-armed" network - packets go in and out the same interface. The two CSS are connected to two different switches, which are connected to the network backbone. The servers are connected to a third switch and must also talk across the backbone...CSS VIP and servers are in the same broadcast domain.

The CSS keepalives (type ssl, port 443) to the servers are regularly failing, and i am seeing a lot of state transitions.

CSS02# sho service summary

Service Name State Conn Weight Avg State

Load Transitions

quantum_ssl01 Alive 3 1 2 60

quantum_ssl02 Alive 0 1 2 54

quantum_ssl_client Alive 6 1 2 0

This coincides with a spike in CPU. The problem is that at this time, i am able to connect to the servers directly via ping and ssl without any issues.

CSS02# sho sys cpu

Chassis CPU Utilizations

Name Slot Sub CPU%

----------------------------------

CSS5-SCM-2GE F0 1 1 91%

CSS5-SSL-K9 D0 2 1 0%

The cpu on my ssl module rarely moves off 0%. I getting very slow ping respone times from the CSS VIP and interface when these cpu spike occur.

Output from the CPU HOG command shows top talkers (fmapmsg...

Checking CPU Hog

TID Name Milliseconds

--- ---- ------------

0x8de4e330 OndmLTickTxTask 0

0x8e10ef40 tDcacheUpd 0

0x8a82edf0 fmapmsg 63

0x8dfea5e0 tImmRx 1

0x8dfe5350 ImmGetAgent 0

All connected Switch ports look fine....we are not servicing many connections at all.

Any advice/help would be greatly appreciated. Is my CSS and server deployment not localised enough?

Thanks AR

Gilles Dufour · ‎12-06-2005

Could you capture more CPU hog info.

I'd like to see if the processes involved are always the same.

The SSL module is not showing a high cpu because the problem is not traffic related.

The CPU is high on the SCM because of some events.

From the cpu hog, it looks like the culprit is fmapmsg which is responsible to handle arp/route events.

Is your subnet a big subnets like /16 ?

Or do you have OSPF or RIP running ?

Thanks,

Gilles.

Thanks for rating this answer.

robersona · ‎12-06-2005

Thanks for the reply Giles. The CSSs's and the servers exist in a large network with a /16 subnet. The output of CPU HOG is attached when problem occurs:

I changed the backup CSS to send ICMP Keepalives. These also fail at the same time as the SSL KA on the primary.

The second CSS displays almost identical CPU HOG output.

The other point is that I am getting a lot of state changes on my secondary CSS virtual router. I am not preempting. The master CSS reports no failure and no state change. The backup CSS reports many state changes, 2 at a time, and remains the backup. Failure reason Preempted. These state changes also coincide with the high cpu and state changes to the services.

Thanks again

Anthony

Gilles Dufour · ‎12-07-2005

Anthony,

this really is a problem due to ARP traffic on your /16 subnet.

We always recommend not to use the CSS in a large environment like this.

The CSS has only a arp cache of 5000 entries.

Your /16 can potentially have 64k hosts which is much bigger than the 5k entries.

I would therefore suggest to create a subnet for the CSS and place it behing a router.

Regards,

Gilles.