Cluster Organization in Docker Compose

I'll make a long story short here: this time last year, I knew nothing about containers or orchestration save that Vagrant had sounded like a cool idea but hadn't done much for me in practice. But we had an architect who did know more, and who set up our applications with a really quite fancy Kubernetes- and Docker-based build and deploy system (more on that some other time, perhaps). Our dev and QA environments became Kubernetes clusters, I started learning how it all worked, things were good. Then he moved on, making myself and one other coworker who knew around as much as I did the de facto experts on everything cloud here. Oops.

One thing neither the architect nor I had anticipated was that many of our enterprise clients turned out not to be on board with Kubernetes. At all. Some of them aren't even comfortable with Docker period, but there's not much to do on that count except wait. For the rest, we decided that orchestration made things so much easier we were going to do it wherever we could get away with it, so we needed to have Docker Compose definitions ready to go.

I did inherit some basic Compose configs, but they were badly out of date; in the interim, we'd added a couple of Postgres extensions, integrated a single sign-on service, done a bunch of restructuring -- you know how it goes. So I wound up going back to the drawing board for all the most complicated bits. And in the process I found out that there were some things I'd taken for granted with Kubernetes that I couldn't with Compose.

A Rocky Start

Like jobs. I was about to miss jobs a lot.

In Kubernetes, jobs let you run one-off tasks. We use this functionality to stand up the database, run migrations, and seed initial data for the dev environment since we regenerate that every day. It works alright: deployment pods bounce off until the database comes up initialized. If something unexpected happens, kill the pod and Kubernetes starts another for you. So far, so good.

Docker Compose doesn't do that. In Docker Compose, things that start up are supposed to stay up, or be replaced if they don't stay up. This was a problem. I was looking for a way to issue a single docker-compose up and have a brand new cluster with all the complicated once-off init stuff done for me. It'd be easy to expand the application server image entrypoint to do all the initialization, but each cluster runs two or three of those behind a load balancer, so just doing that could have inconsistent results from the spinup code firing multiple times.

Broken down, here's everything that needs to happen between the application services and the database when the cluster comes up, in order:

  1. If Postgres isn't running yet, don't start any app services.
  2. If the application database roles do not exist, create them.
  3. If the application database does not exist, create it.
  4. Deploy any outstanding migration scripts to the database.
  5. If the database infrastructure for single sign-on does not exist, create it.
  6. Deploy any new content to the database.
  7. Update the locale files for new content in each application container.
  8. Apply configuration to each application container.
  9. Start the application services.

Everything up through #6 needs to happen once and only once. But with Docker Compose, we don't have one-offs. So it all has to go in the entrypoint script, or near enough; we just have to make sure only one of the application services can execute the sensitive parts, and that its peers wait for that to happen before they spin up.

Scheduling Startup

The first problem to solve is making sure nothing tries to come up until the database is there. With Kubernetes, we use init containers for this: both the setup job and the application server deployment declare an init container which tries to select 1 every few seconds until it succeeds. Docker Compose doesn't have anything like that to my knowledge; the most it does is generate a dependency graph from your links and depends_on and a couple other service attributes. This ensures that services are started in a particular order, but since Postgres takes a couple seconds to come up the dependent containers could in fact finish their startup before it's ready.

The way to ensure nothing tries to talk to Postgres until it's good and ready is to wrap the startup command. The Docker Compose documentation recommends a few options; I went with wait-for-it. It looks like this in the Compose config:

    entrypoint: ["./wait-for-it.sh", "postgres:5432", "--", "bash", "./entrypoint.sh"]

Our entrypoint.sh is not run unless and until the Postgres container starts listening on its default port 5432. That's great, but there's one other thing that makes this really useful: since we already have multiple application services defined (Swarm isn't guaranteed so we can't set replicated mode), we can pick one of those to wait for Postgres to come up, and have the rest wait for it to come up in turn. That's our init container.

Secrets

At this point we can ensure that nothing that depends on the database will start up before the database is ready, and that one of our app services will always finish its startup before any others begin theirs. What we need now is a way to distinguish that service from the others so it can execute our once-only tasks. That's where secrets come in.

Secrets are basically the same concept between Kubernetes and Docker Compose: files containing sensitive data which get loaded onto nodes and mounted to the container filesystem. It's more secure than using environment variables. Secrets are defined as a top-level block in the compose config:

secrets:
  db_owner_password:
    file: ./secrets/db_owner_password.txt

And then attached to each service definition:

  appserver:
    image: myimage
    links:
      - postgres
    secrets:
      - db_owner_password
    entrypoint: ["./wait-for-it.sh", "postgres:5432", "--", "bash", "./entrypoint.sh"]

The dependent application services don't need the db_owner_password; that's only required to initialize the database. So we can test for the presence of the secret in our entrypoint script, and kick all that off only if it's present:

if [ -a /run/secrets/db_owner_password ]; then
  # check and create the application roles and database, then run the migrations
fi

Now the appserver service is unique, and we've restricted the ability to stage the database to it. We can't be completely careless -- if appserver blindly emits a createdb every time it starts, it'll fail with a "database already exists" error every time after the first -- but since we've guaranteed there will only ever be one container trying to create the database at a time, we can simply check up front.

That leaves the shared configuration and content, which together are more than secrets are meant to deal with.

Volumes

Mounting information from the host system into containers is a pretty general use case. Secrets cover a specific subset of this. For everything else, there's volumes (and again, Kubernetes' version of the concept is a pretty close analogue). Since volumes can be much larger than secrets, they aren't automatically shared across nodes; you have to create a named volume explicitly, and use a driver which is multi-host aware.

Declare named volumes for config and content:

volumes:
  app_conf:
    driver: local # this is obviously not multi-host aware, but it's good enough for testing
  app_content:
    driver: local

Then in the appserver service definition, add a volumes block:

volumes:
  - app_conf:/home/appserver/app/conf
  - app_content:/home/appserver/app/content

Docker Compose will create the volumes if they do not exist, or you can docker volume create them ahead of time. Better to do the latter, since otherwise the first time you bring up the cluster everything will die horribly since the volumes are empty. If you create them manually, you can docker volume inspect them, find the mountpoint on the host system, and copy the instance configuration and content in before you start spinning things up.

One caveat: the names app_conf and app_content are not actually the names Docker Compose looks for. Compose prepends docker_ to the names you supply, so the volumes should be named docker_app_conf and docker_app_content.

The End

Start to finish, it took me a few weeks to get my first real Compose cluster set up. It's rough getting started, even though the Docker and Compose documentation is quite good; it's a lot to wrap your head around, and there are a lot of concepts you really just have to sort of brute force your way into understanding. I had a lot of other stuff on my plate at the time (still do!), which certainly didn't help matters either.

The good news is, yesterday I had to set up another app with a similar Compose configuration from scratch. This time, I had it up and running within a couple hours. Once you've got the structure down and understand how the pieces fit together it's a lot more manageable.