I have recently done an audit on a new customer site to find a PROGGER deployment split across two sites without a Private network. Each progger only had a single NIC with a single IP Address. Now I am fairly sure it's spelled out pretty clearly in the SRND that UCCE must have a public and private network, however in following up the audit I came across a forum post whereby it was stated that in a PROGGER environment, that the private network was not needed.
Now I was still fairly sure that this was not the case so raised a PDI helpdesk case, to be told that while not desirible it would still be supported by TAC but wasn't the best design for high availability. Now I'm really confused, after spending years working with ICM, I've never come across a deployment without a private network. Also, I'm pretty sure the CCBU would not allow this during the A2Q, there is no option in the questionnaire that anything other than a minimum of two networks is required, and the SRND is clear. Can someone please put my mind at ease and let me know the clear answer.
I have a customer with an older 6.x ICM system, well, without private links - everything goes out on the same interface, and both high and "normal" messages share the same path. The contact center has been working fine for years.
Of course, you will never pass the A2Q process if you don't include private links between ICM sides. My guess is that CCBU wants to ensure if the contact center's quality of service is degraded, the WAN link bandwidth is not to be blamed.
I gues it's a historic thing, using private paths they wanted to ensure the messages which are supposed to be prioritized, are physically separated. With today's high speed MPLS links this might not be the case anymore.
However, I definitely _would_not_ use this as an argument while discussing the details while the A2Q process. I did, and they turned us down.
I know there's no clarification in this post, only a few points but I hope it helps.
It's not really a question of WAN bandwidth, but more the problem of losing both the private and public links between a pair of Roggers at the same time, and each Rogger being able to see enough duplexed PGs to assume it's in control - the split-brain issue. If you don't have any other PGs - just the PGs on the Roggers/Proggers, then perhaps you can argue the case.
This is exactly it. The public and private networks are used in failure scenarious to detect whether there is a network failure, or a component failure, and prevent the split brain scenario. Now in a PROGGER scenario this is slightly different in that the PGs reside on the same server as the Routers.
The SRND does state that if both the visible and private network fails even for a moment, it can cause database corruption in the Loggers. This I guess again is referring to a split brain scenario where both Loggers log different transactions. Could a split brain still occur if there is a WAN failure? As this will knock out both the visible and private network as both local PGs could see their local UCM, however could not communicate to one another so potentially both PGs could be connected and both sides at the same time?
I really just need to understand what recommendation if any to make to my client.
You need majority in order for a side to take over. 50% is not majority so you would not see a split brain with just two PGs.
Good point, so I have been thinking of failure scenarios and working out what the system will do. The SRND says, if neither ICM can communicate with each other via either the Visible or the Private network, the ICM Router / Logger that will become active, in simplex mode, is the one that has communication with the majority of active PGs, this will typically be side A. Side B will go into isolated-disabled mode. Now lets say for example side A is the active PG, the switch dies connected to side A which knocks it out of action. Side B however has no communication with Side A, does this mean Side B would go into isolated-disabled mode, while Side A would not be able to connect to it's UCM and therefore would also be isolated? Meaning we have complete loss of service? That's the scenario I'm worried about.
Ah, yes yes time for the infamous dummy PG. The dummy PG is a PG which does nothing, but it installed in a place where only a single side would be able to reach it in case of a LAN/WAN failure. This will force the non-dummy side to go inactive and the dummy side to have majority (2/3) and stay active or go active.
How many PGs are there? If there's only one, i.e. PG1A is co-resident with RTRA, and PG1B is co-resident with RTRB, then this is a VERY BAD design.
If the WAN between the two sides of the central controller goes down, then both sides will remain isolated-enabled, and they WILL run split-brain. Each side will be routing independently of the other, and they will write different data to their loggers & HDSes. If they have AWs pointed at each side, they could conceivably make different config or script updates, too.
SideA will run simplex with only half the PGs. SideB would need half+1 of the PGs to be connected to it, and also report that sideA is un-reachable, before it would run simplex. If there's only one PG to start with, then both sides would have 1/1 of PGs connected, which would constitute device majority.
Adding a dummy PG to either side will provide *some* measure of help. i.e. add PG2A to sideA only. If the WAN goes down then sideA will survive simplex, and sideB will stay idle.
BUT - what if sideA reall DOES go down? SideB wouldn't be able to run simplex without manual intervention (to set it to simplex)
I HATE HATE HATE geographically split duplex PGs. Combining that with no private network, and a converged public network, is a recipe for disaster.
I don't know who said that nightmare would be supported, but whoever said it obviously isn't the person who's gonna get paged out in the middle of the night when all helll breaks loose.
Do it right, or don't do it at all, is my suggestion.
That's pretty much what I thought, however I was asked by the IPT partner and the customer to the independant audit. Now before I unleashed the bad news (this is not the only thing done completely wrong) I thought I'd get Cisco's comments at which point I've been told it would be supported but not desirable, however would probably fail A2Q. I as such closed my PDI case as was obviously talking to someone who doesn't understand the product or A2Q process. I've since raised a TAC case to get a formal cisco response.
I HATE HATE HATE geographically split duplex PGs.
Agreed. But when customers demand clustering across the WAN, what can we do?
(Not commenting on this particular case)