cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7240
Views
5
Helpful
7
Replies

IronPort Cluster problem

IFR
Level 1
Level 1

Hi,

we have a IronPort C160 & C170 they worked pretty fine in a cluster. But now we have updated them to the latest version now we get the following error:

Error connecting to cluster machine mail1.domain.de (Serial #: xxxxxxxx) at IP 192.168.10.11 - Connection failure - ('l4_transport/coro_socket_transport.py read_line|87', "<type 'exceptions.EOFError'>", '', '[cluster/connection_pool.py _create_cluster_connection|518] [cluster/cluster_command_client.py connect|123] [_coro.pyx coro._coro.sched.with_timeout (coro/_coro.c:11742)|1099] [cluster/cluster_command_client.py _ssh_connect|95] [transport/client.py connect|29] [transport/client.py _connect|56] [l4_transport/coro_socket_transport.py read_line|87]')

Last message occurred 60 times between Tue May 13 11:23:31 2014 and Tue May 13 12:22:32 2014.

Version: 8.5.5-280

 

 

The connstatus shows the following:

Cluster mailcluster
===================
  Group Main_Group:
    Machine mail1.domain.de (Serial #: xxxxxxxx)
    Machine mail2.domain.de (Serial #: xxxxxxxx)
        Machine mail1.domain.de (Serial #: xxxxxxxxxx)  - disconnected: mail2.domain.de
-> mail1.domain.de: Internal server error:  (Tue May 13 12:40:32 2014 CEST)


Cluster mailcluster

 

Both machines are in the same network, regarding the management side, the switch is fine. We setup the cluster via ssh over port 22. We already tried port 2222. No difference. As already told before it worked pretty fine before the update to 8.5.5 ! We came from 8.0.1 or so, I don't remember it very well.

 

Any ideas ?

1 Accepted Solution

Accepted Solutions

For continuity - I did work direct with Wolfgang on this issue... we were able to work around the EOF application fault/error by creating the cluster on mail2, and then joining mail1 over.  At that time - creating the cluster was successful, and appliances were in cluster together.  The next aspect of issues - we were seeing issues with the appliances communicating on port 22:

[]> connstatus

Cluster mailcluster
==========
  Group Main_Group:
    Machine mail1.domain.de (Serial #: XXX)  - disconnected: mail2.domain.de -> mail1.domain.de: Internal server error:  (Thu May 15 16:28:11 2014 CEST)
    Machine mail2.domain.de (Serial #: YYY) 
        Machine mail1.domain.de (Serial #: XXX)  - disconnected: mail2.domain.de -> mail1.domain.de: Internal server error:  (Thu May 15 16:28:11 2014 CEST)

 

After troubleshooting a determination was made to reboot the appliances to clear and assure clean communication between appliances.  This resulted in clearing the network errors, and cluster was able to establish successfully:

[]> connstatus

Cluster mailcluster
==========
  Group Main_Group:
    Machine mail1.domain.de (Serial #: XXX) 
    Machine mail2.domain.de (Serial #: YYY) 

 

In the end - realtime troubleshooting and support was needed, and solution provided.

-Robert

View solution in original post

7 Replies 7

Robert Sherwin
Cisco Employee
Cisco Employee

The EOF notation in the error indicates that the appliance(s) are having issues with reading the end of file/configuration.  Suggestion would be to remove both machines from cluster, CLI: 'clusterconfig' -> 'removemachine'.  Re-configure the cluster on mail1, and then join mail2 into the cluster.  I would suggest using only IP address, and also be sure that you are using only 22/SSH.  When the prompt for "would you like to start CCS/2222" appears, be sure to select no.

Once mail2 is in cluster - run 'clustercheck' from the CLI to assure health and connectivity.

Also - assure that you have PTR records in place for all cluster members:

DNS and Hostname Resolution

DNS is required to connect a machine to the cluster.  Cluster communication is normally initiated using the DNS hostnames of the machines (not the hostname of an interface on the machine).  A machine with an unresolvable hostname would be unable to actually communicate with any other machines in the cluster, even though it is technically part of the cluster.

Your DNS must be configured to have the hostname point to the correct IP interface on the appliance that has SSH or CCS enabled. This is very important.  If DNS points to another IP address that does not have SSH or CCS enabled it will not find the host.  Note that centralized management uses the "main hostname," as set with the sethostname command, not the per-interface hostname.

If you use an IP address to connect to another machine in the cluster, the machine you connect to must be able to make a reverse look up of the connecting IP address.  If the reverse look up times out because the IP address isn't in the DNS, the machine cannot connect to the cluster.

For best practices, and full review of cluster setup --- please see Ch. 38 in the 8.5.5 guide: http://www.cisco.com/c/dam/en/us/td/docs/security/esa/esa8-5-5/ESA_8-5-5_User_Guide.pdf

I hope this helps!

-Robert

(*If you have received the answer to your original question, and found this helpful/correct - please mark the question as answered, and be sure to leave a rating to reflect!)

Thanks Robert, for this extensive explanation !

I now deleted the cluster. I set it up from new. But I'm not able to join the second machine again.

On the second machine. I got the following messages while I tried to choin to the cluster:

 

Choose the interface on which to enable the Cluster Communication Service:
1. DMZ (192.168.88.12/24: svspam2.local)
2. Internet (192.168.92.13/24: mail2.domain.de)
[1]>

Enter the port on which to enable the Cluster Communication Service:
[2222]> 22

That port is already in use.
Enter the port on which to enable the Cluster Communication Service:
[22]>

 

I changed the port to 2222. Then the connect fails.

Enter the IP address of a machine in the cluster.
[192.168.88.11]>

Enter the remote port to connect to.  This must be the normal admin ssh port, not the CCS port.
[22]>

Enter the name of an administrator present on the remote machine
[admin]>

Enter password:
Failed to join the cluster.
Error was: 'Unexpected EOF on connect'
Enter the IP address of a machine in the cluster.

 

Any further ideas ?

Thanks in advance.

Wolfgang

Probably best at this time to open a support case and have us work directly with you to address.  If you can please.  Depending on when you get this opened, I will be glad to help once on shift, and we can work via webex.    

-Robert

There is already a case open via our local distributor since April 24. And one of your collegues already did a webex.

I'll send you more information via pm. Thanks so far.

For continuity - I did work direct with Wolfgang on this issue... we were able to work around the EOF application fault/error by creating the cluster on mail2, and then joining mail1 over.  At that time - creating the cluster was successful, and appliances were in cluster together.  The next aspect of issues - we were seeing issues with the appliances communicating on port 22:

[]> connstatus

Cluster mailcluster
==========
  Group Main_Group:
    Machine mail1.domain.de (Serial #: XXX)  - disconnected: mail2.domain.de -> mail1.domain.de: Internal server error:  (Thu May 15 16:28:11 2014 CEST)
    Machine mail2.domain.de (Serial #: YYY) 
        Machine mail1.domain.de (Serial #: XXX)  - disconnected: mail2.domain.de -> mail1.domain.de: Internal server error:  (Thu May 15 16:28:11 2014 CEST)

 

After troubleshooting a determination was made to reboot the appliances to clear and assure clean communication between appliances.  This resulted in clearing the network errors, and cluster was able to establish successfully:

[]> connstatus

Cluster mailcluster
==========
  Group Main_Group:
    Machine mail1.domain.de (Serial #: XXX) 
    Machine mail2.domain.de (Serial #: YYY) 

 

In the end - realtime troubleshooting and support was needed, and solution provided.

-Robert

Thanks again Robert for your quick help. All works pretty fine.

That's what I call customer support !

Kind regards

Wolfgang

 

I just did some more testings.

 

I guess I don't need the Cluster Communication Service on my second appliance (mail2).

So the setting looks now like this:

mail1:

setup a cluster with communication over IP.

-->

Cluster FG
==========
  Group Main_Group:
    Machine mail1.domain.de (Serial #: xxxxx)

 

mail2:

tried to connect to mail1 over ssh with port 22

But then I get still the error:

Failed to join the cluster.
Error was: 'Unexpected EOF on connect'
 

A telnet from mail2 to mail1 on port 22 works perfect.