12-05-2005 05:45 PM
Hi, I have redundant 11503s (sg0730106 standard feature) configured with VIP redundancy, accepting incoming SSL requests, then talking to backend ssl servers. The CSS are deployed in a "one-armed" network - packets go in and out the same interface. The two CSS are connected to two different switches, which are connected to the network backbone. The servers are connected to a third switch and must also talk across the backbone...CSS VIP and servers are in the same broadcast domain.
The CSS keepalives (type ssl, port 443) to the servers are regularly failing, and i am seeing a lot of state transitions.
CSS02# sho service summary
Service Name State Conn Weight Avg State
Load Transitions
quantum_ssl01 Alive 3 1 2 60
quantum_ssl02 Alive 0 1 2 54
quantum_ssl_client Alive 6 1 2 0
This coincides with a spike in CPU. The problem is that at this time, i am able to connect to the servers directly via ping and ssl without any issues.
CSS02# sho sys cpu
Chassis CPU Utilizations
Name Slot Sub CPU%
----------------------------------
CSS5-SCM-2GE F0 1 1 91%
CSS5-SSL-K9 D0 2 1 0%
The cpu on my ssl module rarely moves off 0%. I getting very slow ping respone times from the CSS VIP and interface when these cpu spike occur.
Output from the CPU HOG command shows top talkers (fmapmsg...
Checking CPU Hog
TID Name Milliseconds
--- ---- ------------
0x8de4e330 OndmLTickTxTask 0
0x8e10ef40 tDcacheUpd 0
0x8a82edf0 fmapmsg 63
0x8dfea5e0 tImmRx 1
0x8dfe5350 ImmGetAgent 0
All connected Switch ports look fine....we are not servicing many connections at all.
Any advice/help would be greatly appreciated. Is my CSS and server deployment not localised enough?
Thanks AR
12-06-2005 03:13 AM
Could you capture more CPU hog info.
I'd like to see if the processes involved are always the same.
The SSL module is not showing a high cpu because the problem is not traffic related.
The CPU is high on the SCM because of some events.
From the cpu hog, it looks like the culprit is fmapmsg which is responsible to handle arp/route events.
Is your subnet a big subnets like /16 ?
Or do you have OSPF or RIP running ?
Thanks,
Gilles.
Thanks for rating this answer.
12-06-2005 03:47 PM
Thanks for the reply Giles. The CSSs's and the servers exist in a large network with a /16 subnet. The output of CPU HOG is attached when problem occurs:
I changed the backup CSS to send ICMP Keepalives. These also fail at the same time as the SSL KA on the primary.
The second CSS displays almost identical CPU HOG output.
The other point is that I am getting a lot of state changes on my secondary CSS virtual router. I am not preempting. The master CSS reports no failure and no state change. The backup CSS reports many state changes, 2 at a time, and remains the backup. Failure reason Preempted. These state changes also coincide with the high cpu and state changes to the services.
Thanks again
Anthony
12-07-2005 12:57 AM
Anthony,
this really is a problem due to ARP traffic on your /16 subnet.
We always recommend not to use the CSS in a large environment like this.
The CSS has only a arp cache of 5000 entries.
Your /16 can potentially have 64k hosts which is much bigger than the 5k entries.
I would therefore suggest to create a subnet for the CSS and place it behing a router.
Regards,
Gilles.
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide