Accessing Elasticsearch clusters via a localhost node

2014-12-06

I’m a regular at the Elasticsearch meetup here in Berlin and there are always lots of recent converts that are trying to wrap their head around the ins and outs of what it means to run an elasticsearch cluster. One issue that seems to baffle a lot of new users is the question of which node in the cluster has the master role. The correct answer is that it depends on what you mean by master. Yes, there is a master node in elasticsearch but that does not mean what you think it means: it merely means that a single node is elected to be the node that holds the truth about which nodes have which data and crucially where the master copies of shards live. What it does NOT mean is that that node has the master copy of all the data in the cluster. It also does NOT mean that you have to talk to specifically this node when writing data. Data in elasticsearch is sharded and replicated and shards and their replicas are copied all over the cluster and out of the box clients can talk to any of the nodes for both read and write traffic. You can literally put a load balancer in front of your cluster and round robin all the requests across all the nodes.

When nodes go down or are added to the cluster, shards may be moved around. All nodes synchronize information about which nodes have which shards and replicas of those shards. The elasticsearch master merely is the ultimate authority on this information. Elasticsearch masters are elected at runtime by the nodes in the cluster. So, by default, any of the nodes in the cluster can become elected as the master. By default, all nodes know how to look up information about which shards live where and know how to route requests around in the cluster.

A common pattern in larger cluster is to reserve the master role for nodes that do not have any data. You can specialize what nodes do via configuration. Having three or more such nodes means that if one of them goes down, the remaining ones can elect a new master and the rest of the cluster can just continue spinning. Having an odd number of nodes is a good thing when you are holding elections since you always have an obvious majority of n/2 + 1. With an even number you can end up with two equally sized network partitions.

The advantage of not having data on a node is that it is far less likely for such nodes to get into trouble with e.g. OutOfMemoryExceptions, excessive IO that slows the machine, or excessive CPU usage due to expensive queries. If that happens, the availability of the node becomes an issue and the risk emerges for bad things to happen. This is a bad thing on a node that is supposed to hold the master data for your cluster configuration. It becoming unavailable will cause other nodes to elect a new node as the maste. There’s a fine line between being unavailable and slow to respond, which makes this a particularly hard problem. The now, infamous call me maybe article highlights several different cluster failure scenarios abd most of these involve some sort of network partioning due to temporary master node failures or unavailability. If you are worried about this, also be sure to read the Elasticsearch response to this article. The botton line is that most of the issues have by now been addressed and are now far less likely to become an issue. Also, if you have declined to update to Elasticsearch 1.4.x with your production setup, now might be a good time to read up on the many known ways in which things can go bad for you.

In any case, empty nodes still do useful work. They can for example be used to serve traffic to elasticsearch clients. Most things that happen in Elasticsearch involve internal node communication since the data can be anywhere in the cluster. So, there are typically two or more network hops involved one from the client to what is usually called a routing node and from there to any other nodes that hold shards needed to complete the request that perform the logic for either writing new data to the shard or retrieving data from the shard.

Another common pattern in the Elasticsearch world is to implements clients in Java and make the embedd a cluster node inside the process. This embedded node is typically configured to be a routing only node. The big advantage of this is that it saves you from having to do one network hop. The embedded node already knows where all the shards live so application servers with an embedded node already know where all the shards are in the cluster and can talk directly to the nodes with these shards using the more efficient network protocol that the Elasticsearch nodes use to communicate with each other.

A few months ago in one of the meetups I was discussing this topic with one of the organizers of the meetup, Felix Gilcher. He mentioned an interesting variant of this pattern. Embedding a node inside an application only works for Java nodes and this is not possible if you use something else. Besides, dealing with the Elasticsearch internal API can be quite a challenge as well. So it would be convenient if non Java applications could get similar benefits. Then he suggested the obvious solution that actually you get most of the same benefits of embedding a node simply running a standalone, routing only elasticsearch node on each application server. The advantage of this approach is that each of the application servers communicates with elasticsearch via localhost, which is a lot faster than sending REST requests over the network. You still have a bit of overhead related to serializing and deserializing requests and doing the REST requests. However, all of that happens on localhost and you avoid the network hop. So, effectively, you get most of the benefits of the embedded node approach.

We recently implemented this at Inbot. We now have a cluster of three elasticsearch nodes and two application servers that each run two additional nodes that talk to the three nodes. We use a mix of Java, Javascript and ruby components on our server and doing this allows us to keep things simple. The eleasticsearch nodes on the application server have a comparatively small heap of only 1GB and typically consume few resources. We could probably reduce the heap size a bit further to 512MB or even 256MB since all these nodes do is pass around requests and data from the cluster to the application server. However, we have plenty of memory and have so far had little need to tune this. Meanwhile, our elasticsearch cluster nodes run on three fast 32GB machines and we allocate half of the memory for heap and reserve the rest for file caching (as per the Elasticsearch recommendations). This works great and it also simplifies application configuration since you can simply configure all applications to talk to localhost and elasticsearch takes care of the cluster management.