cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
895
Views
0
Helpful
2
Replies

New VCS Cluster - odd error regarding Cluster lost communication.

Chris Swinney
Level 5
Level 5

Hi all,

VCS Expressway - x8.2.1

We are creating a cluster of our VCS Expressways. They are in different geographical locations but tied by a very high speed network - the approximate round trip time is just a couple of milliseconds. However, we got an odd error last night where the peers failed a communication test but then then wouldn't re-stablish communication.

The error was "Cluster communication failure: The system is unable to communicate with one or more of the cluster peers", but when you navigated into the clustering configuration, the warning that was showing against the inactive peer was something like:

"The other peer has different IP address configured or is running a different software version. The peer lists should be identical and the peers should be running the same software".

This looked like both peers could see each other, but something else was stopping them from peering properly. Of course, the peer lists ARE identical, as are the software revisions (x8.2.1) on both. It looked as though an alarm was initially raised at 04:46:58, then lowered at 04:47:28, than raised again (and never lowered) at 04:48:40. Rebooting the additional peer resolved the problem.

Even though the alarm was raised, both devices could actually reach each other and of course the peer list and software revision were OK. I can understand that network glitches may through such communication errors, but I don't understand the warnings on the inactive peer, and why the devices did not regain communicaiton.

Any ideas?

Cheers

Chris

 

2 Replies 2

Adam Wamsley
Cisco Employee
Cisco Employee

Hey Chris,

Some things you can check for further root cause.

- Grab a full system snapshot from the master and peer.

- In the harddisklogs folder look at the developer logs around the time the issue occurred. There should be some events with clusterdb here that may provide some more information.

- Also look at the sysinit.log and make sure the service was started and did not stop.

- And finally the network log to see if anything strange was occurring or recorded with network connectivity at this time.

 

Adam

 

I've just had exactly the same, confusing, error with our VCS-C cluster x8.6.1.

In the end it turned out to be the MTU setting on the link between our new Juniper network and the firewalls. It still had the default MTU of 1500 instead of 9600.

Cluster communication requires around 4000, and since it's going through an IPSec tunnel, it won't  fragment, so it fails. Old Avaya network had 9600 configured, so all worked nicely until we changed over. Live and learn....

/jens

Please rate replies and mark question(s) as "answered" if applicable.

Please rate replies and mark question(s) as "answered" if applicable.
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: