MSE 7.6.120 HA Experiences

After abandoning the 7.4 releases with HA because it was very unstable I've now attempted yet again HA with our virtual MSE environment.  I was running 7.6.120 since the day it was released without any problems.  In the last week I decided to pair it with a secondary.  After two days I noticed the primary service was down.  'service msed start' brought the pair back up without problems.  However, this happens every day or so.  Cisco TAC has been 'researching' for two days now.

I also noticed on failing over I can get the secondary active within about 20 seconds.  Failback never works because the primary health monitor service goes down every time I do a failover.  This is another tac case that is currently underway.  I find if I try to failback and let prime give me the fail message about the primary down I can then reboot the primary MSE and the pair will come back up with the primary active.  I just don't understand how HA can be so broken on an MR release.  The open caveats in the release notes show only 4 entries as if this is a rock solid release.  I am also going to mirror other comments on TAC where it seems they just can't resolve anything with it comes to the MSE.  If they do it's after multiple internal escalations and days later.  Both MSEs were clean builds with 16vCPUs, 32GB RAM each with dedicated VMWare hosts running the latest hardware.

I gave up on HA for MSE. TAC had several cases and last case has been sitting for couple of months.

