May 6th, 2014Implementing Blue-Green Deployments with AWS
An important technique for reducing the risk of deployments is known as Blue-Green Deployments. If we call the current live production environment “blueâ€, the technique consists of bringing up a parallel “green†environment with the new version of the software and once everything is tested and ready to go live, you simply switch all user traffic to the “green†environment, leaving the “blue†environment idle. When deploying to the cloud, it is common to then discard the idle environment if there is no need for rollbacks, especially when using immutable servers.
If you are using Amazon Web Services (AWS) as your cloud provider, there are a few options to implement blue-green deployments depending on your system’s architecture. Since this technique relies on performing a single switch from “blue†to “greenâ€, your choice will depend on how you are serving content in your infrastructure’s front-end.
Single EC2 instance with Elastic IP
In the simplest scenario, all your public traffic is being served from a single EC2 instance. Every instance in AWS is assigned two IP addresses at launch –Â a private IP that is not reachable from the Internet, and a public IP that is. However, if you terminate your instance or if any failure occurs, those IP addresses are released and you will not be able to get them back.
An Elastic IP is a static IP address allocated to your AWS account that you can assign as the public IP for any EC2 instance you own. You can also reassign it to another instance on demand, by making a simple API call.
In our case, Elastic IPs are the simplest way to implement the blue-green switch –Â launch a new EC2 instance, configure it, deploy the new version of your system, test it, and when it is ready for production, simply reassign the Elastic IP from the old instance to the new one. The switch will be transparent to your users and traffic will be redirected almost immediately to the new instance.
Multiple EC2 instances behind an ELB
If you are serving content through a load balancer, then the same technique would not work because you cannot associate Elastic IPs to ELBs. In this scenario, the current blue environment is a pool of EC2 instances and the load balancer will route requests to any healthy instance in the pool. To perform the blue-green switch behind the same load balancer you need to replace the entire pool with a new set of EC2 instances containing the new version of the software. There are two ways to do this –Â automating a series of API calls or using AutoScaling groups.
Every AWS service has an API and a command-line client that you can use to control your infrastructure. The ELB API allows you to register and de-register EC2 instances, which will either add or remove them from the pool. Performing the blue-green switch with API calls will require you to register the new “green” instances while de-registering the “blue” instances. You can even perform these calls in parallel to switch faster. However, the switch will not be immediate because there is a delay between registering an instance to an ELB and the ELB starting to route requests to it. This is because the ELB only routes requests to healthy instances and it has to perform a few health checks before considering the new instances as healthy.
The other option is to use the AWS service known as AutoScaling. This allows you to define automatic rules for triggering scaling events; either increasing or decreasing the number of EC2 instances in your fleet. To use it, you first need to define a launch configuration that specifies how to create new instances – which AMI to use, the instance type, security group, user data script, etc. Then you can use this launch configuration to create an auto-scaling group defining the number of instances you want to have in your group. AutoScaling will then launch the desired number of instances and continuously monitor the group. If an instance becomes unhealthy or if a threshold is crossed, it will add instances to the group to replace the unhealthy ones or to scale up/down based on demand.
AutoScaling groups can also be associated with an ELB and it will take care of registering and de-registering EC2 instances to the load balancer any time an automatic scaling event occurs. However the association can only be done when the group is first created and not after it is running. We can use this feature to implement the blue-green switch, but it will require a few non-intuitive steps, detailed here:
- Create the launch configuration for the new “green†version of your software.
- Create a new “green†AutoScaling group using the launch configuration from step 1 and associate it with the same ELB that is serving the “blue†instances. Wait for the new instances to become healthy and get registered.
- Update the “blue†group and set the desired number of instances to zero. Wait for the old instances to be terminated.
- Delete the “blue†AutoScaling group and launch configuration.
This procedure will maintain the same ELB while replacing the EC2 instances and AutoScaling group behind it. The main drawback to this approach is the delay. You have to wait for the new instances to launch, for the AutoScaling group to consider them healthy, for the ELB to consider them healthy, and then for the old instances to terminate. While the switch is happening there is a period of time when the ELB is routing requests to both “green†and “blue†instances which could have an undesirable effect for your users. Because of that reason, I would probably not use this approach when doing blue-green deployments with ELBs and instead consider the next option – DNS redirection.
DNS redirection using Route53
Instead of exposing Elastic IP addresses or long ELB hostnames to your users, you can have a domain name for all your public-facing URLs. Outside of AWS, you could perform the blue-green switch by changing CNAME records in DNS. In AWS, you can use Route53 to achieve the same result. With Route53, you create a hosted zone and define resource record sets to tell the Domain Name System how traffic is routed for that domain.
You can use Route53 to perform the blue-green switch by bringing up a new “green†environment – it could be a single EC2 instance, or an entire new ELB – then you simply update the resource record set to point the domain/subdomain to the new instance or the new ELB.
Even though Route53 supports this common DNS approach, there is a better alternative. Route53 has an AWS-specific extension to DNS that integrates better with other AWS services, and is cheaper too –Â alias resource record sets. They work pretty much the same way, but instead of pointing to any IP address or DNS record, they point to a specific AWS resource: a CloudFront distribution, an ELB, an S3 bucket serving a static website, or another Route53 resource record set in the same hosted zone.
Finally, another way to perform the blue-green switch with Route53 is using Weighted Round-Robin. This works for both regular resource record sets as well as alias resource record sets. You have to associate multiple answers for the same domain/sub-domain and assign a weight between 0-255 to each entry. When processing a DNS query, Route53 will select one answer using a probability calculated based on those weights. To perform the blue-green switch you need to have an existing entry for the current “blue†environment with weight 255 and a new entry for the “green†environment with weight 0. Then, simply swap those weights to redirect traffic from blue to green.
The only disadvantage of this approach is that propagating DNS changes can take some time, so you would have no control over when the user will perceive it. The benefits are that you expose human-friendly URLs to your users, the switch happens with near zero-downtime, you can test the new “green†environment before promoting it, and with weighted round-robin you get the added flexibility of doing canary releases for free.
Environment swap with Elastic Beanstalk
The last scenario is when you are deploying your web application to Elastic Beanstalk, Amazon’s platform-as-a-service offering that supports Java, .Net, Python, Ruby, NodeJS and PHP. Elastic Beanstalk has a built-in concept of an environment that allows you to run multiple versions of your application, as well as the ability to perform zero-downtime releases. Therefore, the blue-green switch simply consists of creating a new “green†environment and following the steps in the documentation to perform the swap.
Conclusion
Blue-Green deployment is an important technique to enable Continuous Delivery. It reduces risk by allowing testing prior to the release of a new version to production, while at the same time enabling near zero-downtime deployments, and a fast rollback mechanism should something go wrong. It is a powerful technique to manage software releases especially when you are using cloud infrastructure. Cloud providers such as AWS enable you to easily create new environments on-demand and provide different options to implement Blue-Green deployments.
May 7th, 2014 at 9:15 pm
Nice writeup – but there is one point which I think is also worth mentioning:
When using this style of deployments, and as most apps deal with a single DB across any number of clustered app server instances, the db will then be on v2 – and thus be used by both versions while both blue AND green are active (ie during the sanity checking duration of when green is still internal). This “backwards compatibility” of the db or non-breaking forward changes (whichever way you look at it) boils down to the team’s discipline in keeping this deployment style alive. I have used BG deployments before, and this particular point has to be actively/explicitly safe-guarded by the team.