Running Elasticsearch in a Docker 1.12 Swarm

My last blog post was on running consul in Docker Swarm. The reason I wanted to that is because I want to run Elasticsearch in swarm so that I can use swarm service discovery to enable other containers to use Elasticsearch. However, I’ve been having a hard time getting that up and running because of various issues and limitations in both Elasticsearch and Docker. While consul is nice, it feels kind of wrong to have two bits of infrastructure doing service discovery. Thanks to Christian Kniep’s article, I know it can be done that way.

However, I actually managed to do it without consul eventually. Since it is completely non trivial to do this, I decided to write up the process for this as well.

Assuming you have your swarm up and running, this is how you do it:

docker network create es -d overlay

docker service create --name esm1 --network es \
  -p 9201:9200 -p 9301:9301 \
   elasticsearch \ \
  -Des.discovery.zen.minimum_master_nodes=2 \

docker service create --name esm2 --network es \
  -p 9202:9200 -p 9302:9302 \
  elasticsearch \ \
  -Des.discovery.zen.minimum_master_nodes=2 \

docker service create --name esm3 --network es \
  -p 9203:9200 -p 9303:9303 \
  elasticsearch \,esm2:9302 \
  -Des.discovery.zen.minimum_master_nodes=2 \

There is a lot of stuff going on here. So, lets look at the approach in a bit more detail. First, we want to be able to talk to the cluster using the swarm registered name rather than an ip address. Secondly, there needs to be a way for each of the cluster nodes to talk to any of the other nodes. The key problem with both elasticsearch and consul is that we have no way to know up front what the ip addresses are going to be of swarm containers. Furthermore, Docker swarm does not currently support host networking so we cannot use the external ip’s of the docker hosts either.

With Consul we fired up two clusters that used each other and via its gossip protocol, all nodes eventually find each other’s ip addresses. Unfortunately, the same strategy does not work for Elasticsearch. There are several issues that make this hard:

The solution to these problems is simple but ugly: an Elasticsearch service can only have one node in Swarm. Since we want multiple nodes in our Elasticsearch cluster, we’ll need to run multiple services: one for each Elasticsearch node. This is why in the example above, we start three services, each with only 1 replica (the default). Each of them binds on eth0 which is where the Docker overlay network ends up. Finally, Elasticsearch nodes rely on the ip address that nodes advertise to talk to each other. So, the port that it advertises needs to match the service port. It took me some time to figure it out but simply doing a -p 9301:9300 is not good enough: it really needs to be -p 9301:9301. Therefore each of the Elasticsearch services is configured with a different port. For the HTTP port we don’t need to do this so we can simply map port 9200 to a different external port. Finally, the services can only talk to other services that already exist. So, what won’t work is specifying,esm2:9302,esm3:9303 on each of the services. Instead, the first service only has itself to talk to. The second one can talk to the first one, and the third one can talk to the first and second one. This also means the services have to start in the right order.

To be clear, I don’t think that this is a particularly good way of running Elasticsearch. Also, several of the problems I outlined are being worked on and I expect that future versions of Docker may make this a little easier.