During the last month, we've found a problem in some CM servers...
What happens is that customer reports that something goes wrong with one of the CM servers of the cluster, and we verify that:
- there's no web access (or sometimes when we try to access the CM IP address, instead of the usual web page we get another one with just the word Platform, but if we try to follow the link, we get a tomcat error - HTTP Status 404 - /iptplatform - Apache Tomcat/5.5.28)
- there's no SSH access
- using a keyboard and monitor, we see nothing displayed
- it replies to ping requests (this is the only thing that works)
- telephones register to the next CM server in the CM list
- SIP traffic sent to this server is lost... (it seems that none of the services is up and running)
Up until now, we've faced this in 7 servers:
Cluster 1 - Pub CM v 22.214.171.1240 7816I4
Cluster 2 - Pub CM v 126.96.36.19900 7825I4
Cluster 3 - Pub and, after that, Subs CM v 188.8.131.520 7825I4
Cluster 4 - Pub and, after that, Subs CM v 184.108.40.2060 7825I4
Cluster 5 - Pub CM v 220.127.116.1100 7825I4
As you can see CM version is not always the same, but all of them are IBM I4 servers.
This is the only relationship we've detected among all of them, as we have installed many other clusters with other servers type, and they are not showing up this issue.
Usually we recover the server by rebooting or, if it doesn't work, using the recovery disk, but we are afraid of it being a hardware bug that could be repeating after some time or happening in other new deployments.
Has anyone faced something similar?
Does anyone know about any problem with these platforms?
Thanks in advance