Migrating to Elastic Beanstalk and Docker

Some time ago we migrated all our services to AWS. This had many benefits, but we were still using the same deployment processes as before. As the number of services grew and our deployment process became more expensive, we decided to explore other options provided by AWS and try out Elastic Beanstalk.

Elastic Beanstalk and Docker

Where we started

A brief overview of our initial AWS setup:

Infrastructure: Services deployed to two EC2 instances behind an Elastic Load Balancer (ELB) for resilience.
Configuration: All service configuration was managed in a central Chef repository.
Deployment Process: ssh onto each instance and run chef-client (either manually or through an automated script).
Zero Downtime Deployment Strategy: Full-stack Blue/Green DNS switch at the end of each sprint; mid-sprint patches via rolling updates behind a load balancer.

The coordination required to do a full-stack deployment and DNS switch at the end of each sprint was particularly painful and it was clear that moving away from full-stack blue/green deployments and making each service independently and easily deployable would be a massive win.

Why Elastic Beanstalk

Elastic Beanstalk has a number of benefits that made it a good candidate for our situation:

Familiar infrastructure - a loadbalancer with a number of EC2 instances.
Zero downtime deployments supported out of the box.
Support for Docker-based distributions.

Docker support wasn’t a requirement, but having it as an option allowed us to consider more sophistacted workflows and deployment pipelines and consider solutions beyond Beanstalk, like a Docker cluster.

Beanstalk also has a few peculiarities. A common complaint is around configuration - Beanstalk makes it easy to use default config for environments, but surprisingly complicated to customise certain settings. For example, exposing only port 80 of your service requires no configuration, but exposing more or other ports can’t be done from the web interface. For simplicity, we decided to accept Beanstalk’s defaults wherever possible.

Docker

Before, all our services were published as RPMs and deployed using yum via Chef. For our Elastic Beanstalk deployment all of this would be replaced by Docker.

Building a Docker image for one of our services was relatively easy. Since we’re using Scala, we could simply used the sbt-native-packager plugin to build and deploy to a private Docker Hub repository. The plugin takes care of creating the Dockerfile, with only minor customisation needed in build.sbt.

The biggest challenge was handling service configuration. Generally there are two solutions:

Use an external configuration service. The Dockerised service then fetches config on startup.
Include all config in the Docker image. Use environment variables for sensitive config, such as passwords.

We chose the second option, since this required the fewest changes to code.

Note: When deploying a service that depends on environment variables to Elastic Beanstalk, it must still be able to start up in the absence of these variables. This is because Beanstalk doesn’t allow you to (easily) specify environment variables during environment creation and only allows you to change the value once the service is running.

Finally we needed a strategy for log files - all files generated during the life of a Docker container are lost when a new version of the service is deployed, unless created on a volume outside of the container. Since we were already using Graylog for logging, we decided to simply disable file logging and rely solely on Graylog.

Deploying to Elastic Beanstalk

Once we’ve published an image to Docker Hub, we could deploy it to Elastic Beanstalk. Setting up our first Elastic Beanstalk environment involved several steps and a few gotchas:

Create deployment descriptor on S3

To deploy an application to Beanstalk, you need a Dockerrun.aws.json file that specifies which Docker image to pull down, Docker Hub credentials if applicable, and port to expose. For example:

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "ovotech/my-service:1.0.0",
    "Update": "true"
  },
  "Authentication": {
    "Bucket": "docker-bucket",
    "Key": "my-service/dockercfg.json"
  },
  "Ports": [
    {
      "ContainerPort": "8080"
    }
  ]
}

The Authentication section refers to a credentials file on S3 that can be used to access private Docker Hub repositories. To get this file we used the dockercfg file that is generated when executing docker login locally, e.g.:

{"https://index.docker.io/v1/": {"auth": "xxxxxxxxxddltoken==","email": "dockeruser@domain.com"}}

The Ports section allows you to specify one (and only one) container port that will be exposed on port 80 of the Beanstalk environment. You can get around this restriction with some effort, but we decided to accept it and have all our endpoints on port 80.

With the deployment descriptor in hand, we could create our Elastic Beanstalk application.

Create the application and environment in Elastic Beanstalk

Beanstalk allows you to create a logical application and define one or more environments (e.g. test, production) for that application.

The environment defines the EC2, Load Balancer and Auto Scaling settings for each instance of your service. Setting it up through the web UI is straight-forward enough, as long as you know your VPC ID, subnets, security groups, etc.

Note: AWS has soft caps on a number of resources. Depending on how many environments you create in Elastic Beanstalk, you might start running into some of these limits. They can be raised on request, but it’s worth being aware beforehand. Limits we’ve encountered thus far were on the number of Security Groups, Load Balancers and Auto Scaling groups.

Once you’ve created one environment, creating a second for the same application is easily done with the “Clone Environment” action. Creating new environments for new applications can be time-consuming, however. It’s worth looking into the AWS CLI or SDK, or at Cloud Formation for automation if you’re planning on setting up a lot of services.

Creating a Pipeline with Go.CD

For Continuous Deployment we chose Go.CD for its first class pipeline support and the fact that it can be self-hosted easily. Although several excellent hosted solutions exist, we preferred to not have the limits on concurrent builds that most of them impose.

Compared to Jenkins, Go.CD has a bit more initial complexity, but once you grasp the basics of Jobs, Tasks and Stages, the rest is pretty simple.

Our pipeline steps are defined with sbt commands for building, testing and publishing to Docker Hub, and AWS CLI commands to deploy to Beanstalk.

Each commit triggers a pipeline build, which bumps the version to the next fixed version and deploys a new image to Docker Hub.

Deployments to UAT and Production are triggered manually, in a single click.

Conclusion

We’ve successfully migrated a number of services to Elastic Beanstalk and have seen a dramatic improvement in the ease of deployment.

There are still some improvements that could be made, mostly around getting more value from our Docker images. As a next step we’d like to start running full service tests against Docker images before deploying to Beanstalk. As a further step we might also explore Docker clustering to get better resource utilisation when running our services.

For now, however, Elastic Beanstalk gives us the simple deployments we were after. Deploying to production several times a day has become trivial.