Migrating to Elastic Beanstalk and Docker
Some time ago we migrated all our services to AWS. This had many benefits, but we were still using the same deployment processes as before. As the number of services grew and our deployment process became more expensive, we decided to explore other options provided by AWS and try out Elastic Beanstalk.
Where we started
A brief overview of our initial AWS setup:
- Infrastructure: Services deployed to two EC2 instances behind an Elastic Load Balancer (ELB) for resilience.
- Configuration: All service configuration was managed in a central Chef repository.
- Deployment Process:
ssh
onto each instance and runchef-client
(either manually or through an automated script). - Zero Downtime Deployment Strategy: Full-stack Blue/Green DNS switch at the end of each sprint; mid-sprint patches via rolling updates behind a load balancer.
The coordination required to do a full-stack deployment and DNS switch at the end of each sprint was particularly painful and it was clear that moving away from full-stack blue/green deployments and making each service independently and easily deployable would be a massive win.
Why Elastic Beanstalk
Elastic Beanstalk has a number of benefits that made it a good candidate for our situation:
- Familiar infrastructure - a loadbalancer with a number of EC2 instances.
- Zero downtime deployments supported out of the box.
- Support for Docker-based distributions.
Docker support wasn’t a requirement, but having it as an option allowed us to consider more sophistacted workflows and deployment pipelines and consider solutions beyond Beanstalk, like a Docker cluster.
Beanstalk also has a few peculiarities. A common complaint is around configuration - Beanstalk makes it easy to use default config for environments, but surprisingly complicated to customise certain settings. For example, exposing only port 80 of your service requires no configuration, but exposing more or other ports can’t be done from the web interface. For simplicity, we decided to accept Beanstalk’s defaults wherever possible.
Docker
Before, all our services were published as RPMs and deployed using yum
via Chef. For our Elastic Beanstalk
deployment all of this would be replaced by Docker.
Building a Docker image for one of our services was relatively easy. Since we’re using Scala, we could simply
used the sbt-native-packager
plugin to build and deploy to a private Docker Hub repository. The plugin takes
care of creating the Dockerfile
, with only minor customisation needed in build.sbt
.
The biggest challenge was handling service configuration. Generally there are two solutions:
- Use an external configuration service. The Dockerised service then fetches config on startup.
- Include all config in the Docker image. Use environment variables for sensitive config, such as passwords.
We chose the second option, since this required the fewest changes to code.
Note: When deploying a service that depends on environment variables to Elastic Beanstalk, it must still be able to start up in the absence of these variables. This is because Beanstalk doesn’t allow you to (easily) specify environment variables during environment creation and only allows you to change the value once the service is running.
Finally we needed a strategy for log files - all files generated during the life of a Docker container are lost when a new version of the service is deployed, unless created on a volume outside of the container. Since we were already using Graylog for logging, we decided to simply disable file logging and rely solely on Graylog.
Deploying to Elastic Beanstalk
Once we’ve published an image to Docker Hub, we could deploy it to Elastic Beanstalk. Setting up our first Elastic Beanstalk environment involved several steps and a few gotchas:
Create deployment descriptor on S3
To deploy an application to Beanstalk, you need a Dockerrun.aws.json
file that specifies which Docker image
to pull down, Docker Hub credentials if applicable, and port to expose. For example:
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "ovotech/my-service:1.0.0",
"Update": "true"
},
"Authentication": {
"Bucket": "docker-bucket",
"Key": "my-service/dockercfg.json"
},
"Ports": [
{
"ContainerPort": "8080"
}
]
}
The Authentication
section refers to a credentials file on S3 that can be used to access private Docker Hub
repositories. To get this file we used the dockercfg
file that is generated when executing docker login
locally, e.g.:
{"https://index.docker.io/v1/": {"auth": "xxxxxxxxxddltoken==","email": "dockeruser@domain.com"}}
The Ports
section allows you to specify one (and only one) container port that will be exposed on port 80
of the Beanstalk environment. You can get around this restriction with some effort, but we decided to accept
it and have all our endpoints on port 80.
With the deployment descriptor in hand, we could create our Elastic Beanstalk application.
Create the application and environment in Elastic Beanstalk
Beanstalk allows you to create a logical application and define one or more environments (e.g. test, production) for that application.
The environment defines the EC2, Load Balancer and Auto Scaling settings for each instance of your service. Setting it up through the web UI is straight-forward enough, as long as you know your VPC ID, subnets, security groups, etc.
Note: AWS has soft caps on a number of resources. Depending on how many environments you create in Elastic Beanstalk, you might start running into some of these limits. They can be raised on request, but it’s worth being aware beforehand. Limits we’ve encountered thus far were on the number of Security Groups, Load Balancers and Auto Scaling groups.
Once you’ve created one environment, creating a second for the same application is easily done with the “Clone Environment” action. Creating new environments for new applications can be time-consuming, however. It’s worth looking into the AWS CLI or SDK, or at Cloud Formation for automation if you’re planning on setting up a lot of services.
Creating a Pipeline with Go.CD
For Continuous Deployment we chose Go.CD for its first class pipeline support and the fact that it can be self-hosted easily. Although several excellent hosted solutions exist, we preferred to not have the limits on concurrent builds that most of them impose.
Compared to Jenkins, Go.CD has a bit more initial complexity, but once you grasp the basics of Jobs, Tasks and Stages, the rest is pretty simple.
Our pipeline steps are defined with sbt
commands for building, testing and publishing to Docker Hub, and
AWS CLI commands to deploy to Beanstalk.
Each commit triggers a pipeline build, which bumps the version to the next fixed version and deploys a new image to Docker Hub.
Deployments to UAT and Production are triggered manually, in a single click.
Conclusion
We’ve successfully migrated a number of services to Elastic Beanstalk and have seen a dramatic improvement in the ease of deployment.
There are still some improvements that could be made, mostly around getting more value from our Docker images. As a next step we’d like to start running full service tests against Docker images before deploying to Beanstalk. As a further step we might also explore Docker clustering to get better resource utilisation when running our services.
For now, however, Elastic Beanstalk gives us the simple deployments we were after. Deploying to production several times a day has become trivial.