You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm able to reproduce a slight variation with the following steps:
Set discovery.zen.minimum_master_nodes = 2 and node.max_local_storage_nodes = 1. These nodes are deployed on NFS so that in case a machine is fails, an ES process can be started on the same data path from a different machine. Each machine has access to data directories for all es nodes.
The Initial deployment was:
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node2:9300 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)
Then I took the following steps:
Killed the network on Node2, which resulted in a cluster with two nodes (ES1,ES3). ES2 process was still running on Node2.
Started new es process on Node1 with path.data=/nfs/node2. I was assuming since node.max_local_storage_nodes = 1, it will not start as ES2 on Node2:9300 already has lock on it, but it started anyways. The cluster now looked like
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node1:9301 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)
ES2 - Node2:9300 (path.data=/nfs/node2) was still running but not part of the cluster.
3. I started the network on Node2 which resulted in following two clusters:
Cluster1:
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node1:9301 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)
Cluster2:
ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node2:9300 (path.data=/nfs/node2)
Now, Node1:9300 is participating in both the clusters
node.max_local_storage_nodes only applies, if you use the same installation directory. As you have a different data.path this setting does not apply.
You should not use NFS at all, when working with elasticsearch. Your search and indexing performance will be dramatically slower. Also native locking is not supported on NFS. You might want to read this thread for further information: http://lucene.472066.n3.nabble.com/Lucene-index-on-NFS-td4011301.html
You should especially make sure, that data directories are not written by two separate process, no matter if these process are on one machine or because your network storage are on a cluster.
Hope this helps (and ensures having a fast and reliable search engine up and running).
This is related to #2117
I'm able to reproduce a slight variation with the following steps:
Set discovery.zen.minimum_master_nodes = 2 and node.max_local_storage_nodes = 1. These nodes are deployed on NFS so that in case a machine is fails, an ES process can be started on the same data path from a different machine. Each machine has access to data directories for all es nodes.
The Initial deployment was:
Then I took the following steps:
ES2 - Node2:9300 (path.data=/nfs/node2) was still running but not part of the cluster.
3. I started the network on Node2 which resulted in following two clusters:
Now, Node1:9300 is participating in both the clusters
This happens because of the way NodeDoesNotExistOnMasterException is handled at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/discovery/zen/fd/MasterFaultDetection.java#L315
When Node2 comes into network, it sees that it is no longer a part of the cluster and starts master election resulting in a new master.
The text was updated successfully, but these errors were encountered: