Encountered a new phenomenon with the ace module and ssl termination.
Four separate contexts, one admin and three server farm contexts. All three server farms terminate ssl on the ace front side.
First of all everything worked like configured. But after a while the active ace stops responding to ssl handshakes. Switching a context from the active module to the hot standby module fixes the situation. Swapping it back again produces the same behavior - no ssl handshake/setup. My only known workaround is to reload the primary/active ace module.
The behavior started after i enabled the third context with ssl termination. It works on all server farm contexts for about 30 minutes under traffic and then suddenly stops. This is really strange!
I captured some sniffer traces for the ssl setup on the ace context itself in condition working and non working and also capture the same setup situation from a client with wireshark.
I don't think my config is the problem because everything works in the first place. The certificate stores are exactly the same on both aces for each context. In my opinion this is a serious bug.
Is this already known? Couldn't find anything in the bug tool. Any chance to get around this?
I already upgraded my modules to 3.0(0)A1(4a) so no chance to go higher with a new release.
Thanks for reading
could you capture a 'show tech' before the failure and at the time of the failure.
Then post it here.
You may as well open a service request as this is a serious issue that should get full attention.
Thanks for having a look Gill is there a chance i can mail you the "show tech" as i am not really happy to post this information in the forum.
I upgraded the SUP-720 in my 6513 chassis from 12.2(18)-SXF5 (monolithic) to 12.2(18)-SXF7 (modular) since then the primary ACE doesn't show this behavior anymore.
I will keep an eye on this and hope it was a problem with the SUP-720 in the chassis.
the ACE is totally independent from the IOS.
So the upgrade would have no effect.
Did you open a service request with the TAC ?
I got your mail with the show tech, but I had no time yet to look into it.
I looked at the 'show tech' but unfortunately you only captured one.
I need at least 2 show tech from the active module capture at the time of the problem which means with SSL connections coming in.
The reason is that I want to see how the different counters are increasing.
If possible tell me how many connections failed between the 2 show tech.
Try to have a number like 10 failures at least.
I actually sent you one per ace with ACE 1 being the one with failed ssl handshakes and ACE 2 with the working ssl handshake.
I will try to get a "PRE and POST" show tech per module.
Regarding the TAC case i haven't been able to do that yet because there is no smart net contract for the modules available right now.
Thanks for reading
what's the aggregate amount of SSL connections that you send in order to see the problem ?
Is it more than 1000 TPS ?
By default you're only allowed to 1000TPS.
When you reach the limit, the ACE will stop accepting new ssl connections.
Currently it is way below 1000 TPS because we are about to migrate the productive enviroment from the CSS to the ACE. We also have the 5000 TPS license so that shouldn't be an issue. It is still testing enviroment and was more or less about to be moved when that problem showed up. I will be in the office tomorrow again and have a look at the TPS and also get a hand on the show techs.
The issue seems to be related to the latest certificate that i added to the crypto files store. After removing the certificate the behavior was back to normal again. I will try to reproduce the error now by importing the certificate again. If that is the case there must be an issue with some fields in the certificate itself or the way the ACE stores them.
Reason for the SSL Problem is described in CSCsh64662.
No Handshakes and frequent cores are the result if anybody will ever stumble upon this issue as well.