NodeDoesNotExistOnMasterException Handling #3663

anandnalya · 2013-09-11T06:40:18Z

This is related to #2117

I'm able to reproduce a slight variation with the following steps:
Set discovery.zen.minimum_master_nodes = 2 and node.max_local_storage_nodes = 1. These nodes are deployed on NFS so that in case a machine is fails, an ES process can be started on the same data path from a different machine. Each machine has access to data directories for all es nodes.

The Initial deployment was:

ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node2:9300 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)

Then I took the following steps:

Killed the network on Node2, which resulted in a cluster with two nodes (ES1,ES3). ES2 process was still running on Node2.
Started new es process on Node1 with path.data=/nfs/node2. I was assuming since node.max_local_storage_nodes = 1, it will not start as ES2 on Node2:9300 already has lock on it, but it started anyways. The cluster now looked like

ES1 - Node1:9300 (path.data=/nfs/node1)
ES2 - Node1:9301 (path.data=/nfs/node2)
ES3 - Node3:9300 (path.data=/nfs/node3)

ES2 - Node2:9300 (path.data=/nfs/node2) was still running but not part of the cluster.
3. I started the network on Node2 which resulted in following two clusters:

Cluster1:
- ES1 - Node1:9300 (path.data=/nfs/node1)
- ES2 - Node1:9301 (path.data=/nfs/node2)
- ES3 - Node3:9300 (path.data=/nfs/node3)
Cluster2:
- ES1 - Node1:9300 (path.data=/nfs/node1)
- ES2 - Node2:9300 (path.data=/nfs/node2)

Now, Node1:9300 is participating in both the clusters

This happens because of the way NodeDoesNotExistOnMasterException is handled at https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/discovery/zen/fd/MasterFaultDetection.java#L315

When Node2 comes into network, it sees that it is no longer a part of the cluster and starts master election resulting in a new master.

anandnalya · 2013-09-24T13:47:54Z

I'm using the following workaround for this: anandnalya@8aba868

spinscale · 2013-09-25T09:27:58Z

Hey,

couple of notes:

node.max_local_storage_nodes only applies, if you use the same installation directory. As you have a different data.path this setting does not apply.
You should not use NFS at all, when working with elasticsearch. Your search and indexing performance will be dramatically slower. Also native locking is not supported on NFS. You might want to read this thread for further information: http://lucene.472066.n3.nabble.com/Lucene-index-on-NFS-td4011301.html
You should especially make sure, that data directories are not written by two separate process, no matter if these process are on one machine or because your network storage are on a cluster.

Hope this helps (and ensures having a fast and reliable search engine up and running).

anandnalya · 2013-09-25T10:19:00Z

HI,

I wanted to know if there are any other complications of using ES over NFS apart from the lucene gotchas.

Thanks,
Anand

bleskes · 2015-01-26T08:45:39Z

I think this should be solved with the improvement to Zen Discovery we made in 1.4. If this is not the case, please feel free to re-open.

clintongormley assigned bleskes Aug 8, 2014

bleskes closed this as completed Jan 26, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NodeDoesNotExistOnMasterException Handling #3663

NodeDoesNotExistOnMasterException Handling #3663

anandnalya commented Sep 11, 2013

anandnalya commented Sep 24, 2013

Uh oh!

spinscale commented Sep 25, 2013

Uh oh!

anandnalya commented Sep 25, 2013

Uh oh!

bleskes commented Jan 26, 2015

Uh oh!

NodeDoesNotExistOnMasterException Handling #3663

NodeDoesNotExistOnMasterException Handling #3663

Comments

anandnalya commented Sep 11, 2013

anandnalya commented Sep 24, 2013

Uh oh!

spinscale commented Sep 25, 2013

Uh oh!

anandnalya commented Sep 25, 2013

Uh oh!

bleskes commented Jan 26, 2015

Uh oh!