My last blog post was on running consul in Docker Swarm. The reason I wanted to that is because I want to run Elasticsearch in swarm so that I can use swarm service discovery to enable other containers to use Elasticsearch. However, I’ve been having a hard time getting that up and running because of various issues and limitations in both Elasticsearch and Docker. While consul is nice, it feels kind of wrong to have two bits of infrastructure doing service discovery. Thanks to Christian Kniep’s article, I know it can be done that way.
However, I actually managed to do it without consul eventually. Since it is completely non trivial to do this, I decided to write up the process for this as well.
Assuming you have your swarm up and running, this is how you do it:
docker network create es -d overlay
docker service create --name esm1 --network es \
-p 9201:9200 -p 9301:9301 \
elasticsearch -Des.network.publish_host=_eth0_ \
-Des.discovery.zen.ping.unicast.hosts=esm1:9301 \
-Des.discovery.zen.minimum_master_nodes=2 \
-Des.transport.tcp.port=9301
docker service create --name esm2 --network es \
-p 9202:9200 -p 9302:9302 \
elasticsearch -Des.network.publish_host=_eth0_ \
-Des.discovery.zen.ping.unicast.hosts=esm1:9301 \
-Des.discovery.zen.minimum_master_nodes=2 \
-Des.transport.tcp.port=9302
docker service create --name esm3 --network es \
-p 9203:9200 -p 9303:9303 \
elasticsearch -Des.network.publish_host=_eth0_ \
-Des.discovery.zen.ping.unicast.hosts=esm1:9301,esm2:9302 \
-Des.discovery.zen.minimum_master_nodes=2 \
-Des.transport.tcp.port=9303
There is a lot of stuff going on here. So, lets look at the approach in a bit more detail. First, we want to be able to talk to the cluster using the swarm registered name rather than an ip address. Secondly, there needs to be a way for each of the cluster nodes to talk to any of the other nodes. The key problem with both elasticsearch and consul is that we have no way to know up front what the ip addresses are going to be of swarm containers. Furthermore, Docker swarm does not currently support host networking so we cannot use the external ip’s of the docker hosts either.
With Consul we fired up two clusters that used each other and via its gossip protocol, all nodes eventually find each other’s ip addresses. Unfortunately, the same strategy does not work for Elasticsearch. There are several issues that make this hard:
The main problem with running elasticsearch is that similar to other clustered software it needs to know the where some of the other nodes in the cluster are. This means we have need a way of addressing the individual Elasticsearch containers in the swarm. We can do this using the ip address that Docker assigns to the containers, which we can’t know until the container is running. Alternatively, we can use the container DNS entry in the swarm, which we also can’t know until the container is running because it includes the container id. This is the root cause of the chicken egg problem we face when bootstrapping the Elasticsearch cluster on top of Swarm: we have no way of configuring it with the right list of nodes to talk to.
Elasticsearch really does not like having to deal with round robin’ed service DNS entries for it’s internal nodes. You get a log full of errors since every time Elasticsearch pings a node, it ends up talking to a different node. This rules out what we did with consul earlier where we solved the problem by running two consul services (each with multiple nodes) that talk to each other using their swarm DNS name. Consul is smart enough to figure out the ip addresses of all the containers since it’s gossip protocol ensures that the information replicates to all the nodes. This does not work with Elasticsearch.
DNS entries of other Elasticsearch nodes that do not resolve when Elasticsearch starts start up, causes it to crash and exit. Swarm won’t create the DNS entry for a service until after it has started.
The solution to these problems is simple but
ugly: an Elasticsearch service can only have one
node in Swarm. Since we want multiple nodes in
our Elasticsearch cluster, we’ll need to run
multiple services: one for each Elasticsearch
node. This is why in the example above, we start
three services, each with only 1 replica (the
default). Each of them binds on eth0
which is where the Docker overlay network ends
up. Finally, Elasticsearch nodes rely on the ip
address that nodes advertise to talk to each
other. So, the port that it advertises needs to
match the service port. It took me some time to
figure it out but simply doing a
-p 9301:9300
is not good enough: it
really needs to be -p 9301:9301
.
Therefore each of the Elasticsearch services is
configured with a different port. For the HTTP
port we don’t need to do this so we can simply
map port 9200 to a different external port.
Finally, the services can only talk to other
services that already exist. So, what won’t work
is specifying
es.discovery.zen.ping.unicast.hosts=esm1:9301,esm2:9302,esm3:9303
on each of the services. Instead, the first
service only has itself to talk to. The second
one can talk to the first one, and the third one
can talk to the first and second one. This also
means the services have to start in the right
order.
To be clear, I don’t think that this is a particularly good way of running Elasticsearch. Also, several of the problems I outlined are being worked on and I expect that future versions of Docker may make this a little easier.