Old School Blue/Green Deployment for IaaS Cloud Deployments: Part II Implementation

Tue Oct 10 2023

The theory behind the blue/green deployment pattern was discussed in a previous post. If this is not an area that you’re familiar with, I would recommend at least skimming Part I here: https://joelj.ca/blog/blue-green-1

As a quick refresher, we finished off Part I with a highly general physical architecture diagram that would enable blue/green deployments for a web application running on 2+ virtual hosts. A layer 7 load balancer (acting as a reverse proxy) was chosen to implement the cutover for application deployments in order to minimize latency. An additional instance of the web server and proxy server were added to the architecture to allow for patching / power cycling of a host without downtime of the application. DNS was chosen to switch traffic from one load balancer to another. The additional latency was deemed acceptable since patching activities could be planned in advance, in consideration of time needed to wash out cached DNS A records.

My task is to translate this architecture diagram to the web stack used by joelj.ca. I chose to host my application on Linode, but this implementation would work for any cloud / hybrid platform that supports provisioning of Ubuntu VMs in a dedicated VLAN.

Some additional design goals I had in mind were fully automated configuration of new hosts and simple code-based management of the pool of VMs.

joelj.ca Components

joelj.ca is implemented as an Angular SPA that consumes my posts and pages from Contentful’s web API. Typically, the server for a SPA can be implemented as a simple virtual directory, since all page components are static. However, I’ve made the architecture more interesting by adding Server Side Rendering through Angular Universal. Angular Universal requires some more sophisticated server-side processing in order to generate the DOM elements for the initial page load. This is supported by an Express server written for the Node.js runtime. The pm2 process manager for Node was used in the original architecture and will be carried into the new one. It adds some additional resilience for things like a crashed process or accidental restart.

A new component that is needed is the Load Balancer. Nginx was chosen for its Linux support and elegant configuration syntax. 

Automation Tooling

I chose to write all shell scripts in Powershell. Powershell 7’s Linux support has matured tremendously and Unix-style shells like bash honestly feel like going back to the dark ages for me. This is a hot take, but I stand by it.

I chose Ansible as my IaC tool so that hosts can be configured and deployed to without any manual effort. The “push” model lends itself well to CI/CD deployments.

I still have some light Azure integrations for secrets management and telemetry. Azure Key Vault stores secrets (host credentials, API keys) used in the deployment pipeline. Azure Application Insights collects traces from both the client and server applications.

I’m using GitHub actions for CI/CD more for convenience and cost saving reasons. A tool geared more towards promotional builds would have worked better for the “slot” type deployment model, but it wasn’t something I wanted to take on for this project.

Build

A build is generated by running the following command in Angular’s subdirectory (blog-app):

git rev-parse --short HEAD > ./src/assets/version.txt && ng build && ng run blog-app:server

The first command burns the current git revision into the app’s assets folders. This is consumed by the server to enrich trace data sent to Application Insights (more on this in another section). The second command builds the client-side files, and the third command builds the server side files. This can be executed through the "npm run build:ssr" command.

Infrastructure

The following Linode nanodes running Ubuntu 22 were provisioned in the Canadian data center, in a dedicated VLAN (nick-named WORLD1).

Host

Public IPv4

Internal IPv4

Role

lb1

172.105.22.44

192.0.2.3/24

Load Balancer 1

lb2

172.105.18.168

192.0.2.4/24

Load Balancer 2

web1

45.79.116.212

192.0.2.5/24

Web Server 1

web2

45.79.116.213

192.0.2.6/24

Web Server 2

The “Nanode” service tier is a shared CPU VM with a modest 1 GB of RAM. This is sufficient for my needs, and will keep costs low even with four virtual hosts. Plus, I can easily scale horizontally if needed.

Deployment

All deployment automation is implemented in the cicd directory. I’ll be taking useful snippets from the relevant files, but it may be useful to reference the full code listings on GitHub: https://github.com/joelj1995/joeljdotca/tree/b1f46eb/cicd

Ansible

The first pillar to the deployment automation is the ansible inventory file (Ansible-Inventory-WORLD1.yml) which specifies the list of hosts, and their attributes.

Hosts are slotted into either the web or gateway group depending on their role (load balancer, reverse proxy, gateway -- geez, just choose a term and stick to it, Joel). Adding a host will include it as a deployment target. A web host will automatically be inserted into the load balancer’s backend pool unless the “disabled: true” property appears. This enables a new web server to be tested before it receives production traffic. internal_ip4 is a custom property indicating the IP address in the LAN. This is needed when we generate the backend pool configuration, so that the load balancer doesn’t hop over the public internet. The remaining configuration deals with the SSH connection, with the root password being parameterized. A newly provisioned host can be immediately added to the inventory file, with all necessary configuration taking place in the deployment pipeline.

To make testing easier, the deployment process has been decomposed into several playbooks.

Ansible-Playbook-Web-Configure.yml - Common configuration tasks for a web server. Installs the Node.js runtime and pm2 package manager. Sets the hostname to the alias specified in the inventory file.

Ansible-Playbook-Web-Deploy.yml - Deploys code to the web servers. Build artifacts are copied to /var/www/joelj.ca-blue or /var/www/joelj.ca-green depending on the “slot” parameter. A pm2 process is created / updated for the target slot and bound to the appropriate port (1024 for blue and 1025 for green).

Ansible-Playbook-Gateway-Configure.yml - Common configuration tasks for a load balancer. Installs nginx and deletes the default site configuration. Sets the hostname to the alias specified in the inventory file.

Ansible-Playbook-Gateway-Deploy.yml -  Deploys the load balancer / reverse proxy configuration. First, the TLS certificates are copied over. Then the configuration files are copied. The two site configurations for a blue and green configuration are copied to an idle location (/etc/nginx/sites-available/). The active configuration is symlinked to /etc/nginx/sites-enabled/ based on the “slot” parameter.

PowerShell Scripts

PowerShell scripts provide some "glue" to call the relevant Ansible playbooks with the correct parameters. They are also used to generate or download files needed for the deployment (TLS certificates, upstream configurations). Rather than include full listings of the key scripts, I’ll describe them in plain English and pick out the important parts.

PS-Connect-Az-Account.ps1 - Reads an Azure Service Principal credential from the environment variables. The credentials are used to authenticate an Azure session. The session is stored on the file system, making it available to other shell instances. The credential supplied by the pipeline has access to Azure Key Vault secrets. This enables other scripts to use Get-AzKeyVaultSecret for retrieving passwords, certificates and API keys.

PS-Deploy-BLUE.ps1 / PS-Deploy-GREEN.ps1 - Make sure all software components are installed on the web servers. Then, copy build artifacts. Restart the pm2 instance to load up the new build.

PS-Activate-BLUE.ps1 / PS-Activate-GREEN.ps1 - Make sure nginx is installed on the load balancers. Get TLS certificates from Azure Key Vault and copy them over. Generate the upstream hosts configuration and activate the site configuration targeting the deployment slot.

PS-Generate-Nginx-Upstreams.ps1 - My favorite of all the scripts I wrote so I’m including the full listing. This gets called by the previous one to generate the backend server pool (upstream) configuration. ansible-inventory is used to serialize the inventory file to JSON, which can be natively parsed by PowerShell. PowerShell dynamically generates a configuration file for the server pool using the internal IP address.

$ErrorActionPreference = "Stop"
$CICDPath = Split-Path -Parent $MyInvocation.MyCommand.Path
$RepoRootPath = Split-Path -Parent $CICDPath

$Inventory = ansible-inventory -i $CICDPath/Ansible-Inventory-WORLD1.yml --list | ConvertFrom-Json
$WebHosts = $Inventory.web.hosts

$ActiveWebHosts = $WebHosts | Select-Object -Property `
    @{name='host';   e={$_}}, 
    @{name='ipv4';   e={$Inventory._meta.hostvars.$_.internal_ip4}},
    @{name='ignore'; e={$Inventory._meta.hostvars.$_.inactive}} ` 
    | Where-Object { -not $_.ignore }

$Blue  = $ActiveWebHosts | ForEach { return "    server $($_.ipv4):1024; # $($_.Host)" } | Join-String -Separator "`n"
$Green = $ActiveWebHosts | ForEach { return "    server $($_.ipv4):1025; # $($_.Host)" } | Join-String -Separator "`n"

$ConfigString =@'
upstream joeljcablue {
__BLUE__
}

upstream joeljcagreen {
__GREEN__
}
'@

Write-Output $ConfigString.replace('__BLUE__', $Blue).replace('__GREEN__', $Green)

Running this against the current inventory produces the following configuration:

upstream joeljcablue {
    server 192.0.2.5:1024; # web1
    server 192.0.2.6:1024; # web2
}

upstream joeljcagreen {
    server 192.0.2.5:1025; # web1
    server 192.0.2.6:1025; # web2
}

PS-Deploy-Main.ps1 -  Orchestrates the deployment end to end. This is the top level script called by the pipeline. It identifies the actively deployed slot, deploys to the inactive one and then activates it. A quick health check is done at the end of the deployment. This script is kind of a kludge. Ideally, the platform would allow me to manually trigger a deployment against a specific slot, and then I’d trigger the switchover manually after validation. GitHub actions has poor support for manual deployment triggers, so this was the best I could do.

Observability

Observability is the ability to assess the internal state of a system. It is achieved through the collection of metrics, logs and traces produced by your application(s). I chose Azure Application Insights as my observability platform for its easy “drop-in” style integration with the Express backend and Angular front end.

As a starting point, I wanted to collect the following attributes in my traces when an end user loads the app in the browser:

  • The web server that processed the initial request (one of two load balanced servers).

  • The deployment slot that served the request (blue or green).

  • The version of the code (ie, the git revision hash).

  • The load balancer that served the request to the browser.

This helps me in a few ways:

  • I can validate in real time that traffic has switched over between the blue/green slots.

  • I can catch a broken web server if it drops off the traces.

  • I can confirm the rollout of a new build.

  • I can confirm that the correct load balancer is serving production traffic after a DNS update (useful for the patching scenario).

The check in itself is triggered by the Angular application, so it must have a way to identify these attributes. This took some creative manipulation of the HTTP headers. This is what it looks like end to end.

1 - Express (Web Server)

Recall that during the build step, the git hash (representing the build version) was stored in ‘assets/version.txt’. The express app can consume this file to get the current code version. The below snippets are taken from blog-app/server.ts.

const version = fs
  .readFileSync(path.join(join(process.cwd(), 'dist/blog-app/browser/assets'), 'version.txt'))
  .toString().trim();

The hostname and working directory can be retrieved through native Node.js functions, which tells us the web server and the deployment slot. These three items are stored together in a space-separated string at app startup.

const resOrigin = `${os.hostname()} ${process.cwd()} ${version}`;

Middleware is added to inject this string into the 'Set-Cookie’ header of each HTTP response (the string is also added to the ‘X-Origin-Node’ for troubleshooting purposes:

server.use((req, res, next) => {
  res.setHeader('X-Origin-Node', resOrigin);
  res.cookie('originnode', resOrigin, { httpOnly: false });
  next();
});

The Set-Cookie will, surprise surprise, set a session cookie upon receipt by the browser. All we’re missing attribute-wise is the load balancer.

2 - Nginx (Load Balancer)

Here’s a sample of what the Nginx server configuration looks for an active blue deployment (green is identical except for the target upstream). Note that joeljcablue is an upstream pointing to port 1024 on the pool web server hosts.

server {
    add_header X-Origin-LB $hostname;

    # SSL configuration
    #
    listen 443 ssl default_server;
    ssl_certificate /etc/ssl/certs/joeljca-2022.crt;
    ssl_certificate_key /etc/ssl/certs/joeljca-2022.rsa;

    server_name www.joelj.ca joelj.ca;

    location / {
        proxy_pass http://joeljcablue;
        add_header Set-Cookie "originlb=${hostname}; Path=/";
    }
}

Here, another Set-Cookie header is appended onto the upstream response. Nginx will substitute ${hostname} with the name of the underlying host (lb1 or lb2).

3 - index.html Inline Script (Browser)

In index.html, an inline script reads the cookies set by the header and stores them in a global variable (serverInfo).

const originNode = ('; '+document.cookie).split(`; originnode=`).pop().split(';')[0];
const decodedOriginNode = decodeURIComponent(originNode);
const serverParts = decodedOriginNode.split(' ');
const originLb = ('; '+document.cookie).split(`; originlb=`).pop().split(';')[0];
window.serverInfo = { node: serverParts[0], slot: serverParts[1], version: serverParts[2], lb: originLb }

Storing this on the first page load ensures that we capture these before another web call (say for an image asset) wipes them out.

4 - Angular Application (Browser)

Finally, the Angular framework triggers a telemetry event including the server info when the app loads.

this.insights.trackEvent({name: 'CheckIn'}, (<any>window).serverInfo);

For fun, I also include this info in the page footer.

A screenshot of a Kusto query I wrote in Application Insights should help with the intuition for why this is useful.

Watching this timeline tells me how well the load is being distributed between nodes, which load balancer is receiving traffic, which slot is active and the version of the app being run.

Ending Thoughts

I did my best to describe the key points and learnings for this mini project, but there were so many interesting points I wasn't able to cover. This architecture has worked very well for me and might be useful template if your application has similar requirements. Of course, the site is completely open source, so the full code is available for you to reference.