Canary Deployments are a time-tested deployment strategy to reduce
risk. The fundamental idea is that deploying software into a
production cluster is different from releasing it to its users. With
canary deployments, you deploy both old and new versions into a
production environment, but send only a small percentage of traffic to
the newer version. That way, if the new version fails, only a few
users are affected rather than the application’s entire user base. If
the newer version works well, the traffic proportion being sent to it
is increased incrementally until the new version has been rolled out
to all live traffic.
With Automated Canary Deployments, the feedback loop is automated –
success causes the new version’s proportion of traffic to be
increased, and failure causes the new version’s traffic proportion to
be rolled back to zero. This saves ops teams from having to either
manually manage canary deployments, or build automation to do so
Fission is the first open source serverless function framework to have
built-in, easy-to-configure Automated Canary Deployments.
Users can configure the percentage of traffic that will receive the
new version of the function, the error rate that will constitute as
failure, and the release rate at which to roll out the new version.
Let’s go over a quick tutorial on using Automated Canary Deployments
Using Automated Canary Deployments in Fission
Automated Canaries are supported from Fission version 0.11. For the
next section we’ll assume you’re already set up with this version; if
not, checkout the install guide.
For this tutorial, we’ll start with two versions of a function
deployed on a cluster, and show how automated canary deployments
incrementally transfer load from one version to another.
First, download a couple of trivial sample functions:
$ curl -LO https://raw.githubusercontent.com/fission/fission/master/demos/canary-successful-scenario/func-v1.js
$ curl -LO https://raw.githubusercontent.com/fission/fission/master/demos/canary-successful-scenario/func-v2.js
Next, create the environment and functions on your Fission cluster:
$ fission env create --name nodejs --image fission/node-env
$ fission fn create --name func-v1 --env nodejs --code func-v1.js
$ fission fn create --name func-v2 --env nodejs --code func-v2.js
Next, create a route with two weighted functions. This route is set
up to send 100% of load to
func-v1, and none to
it’s important that it’s set up for two function versions.
$ fission route create --name route-canary \
--method GET --url /canary \
--function func-v1 --weight 100 \
--function func-v2 --weight 0
Fission’s canary automation can roll out the newer version by changing
these weights over time. Let’s create a canary configuration
resource, which configures the parameters of how exactly the automated
rollout will occur, and when it will rollback if necessary.
fission canary-config create --name canary-1 \
--funcN func-v2 --funcN-1 func-v1 \
--httptrigger route-canary \
--increment-step 10 --increment-interval 30s \
First, the configuration needs to know the old and new versions.
Next, it needs to know which HTTP route to canary. The next two
parameters, step and interval, specify how much the traffic shifts and
how often; in other words it’s the rate of progress for the canary.
Finally, the user specifies the maximum permissible percentage of
failures; failures beyond this percentage cause a roll back.
The rest is automatic. In this example we’ve set the roll out rate to
be pretty fast. In reality you’d probably want have it much slower,
so you have more real-world testing on the new version before
incrementing it to significant percentages.
With Prometheus, you can view the traffic to the old function drop as
the traffic to the new version rises:
An overview of how Automated Canary Deployments work
The user deploys two functions into Fission, and configures a trigger
to split traffic between them. The Fission router uses this traffic
split configuration to send an appropriate fraction of traffic to each
version of the function. A common traffic split would be something
like 90% to the old stable version and 10% to the new version.
Fission’s Prometheus integration means that the Fission Router reports
error rate metrics for all functions. Prometheus scrames these
metrics and stores them.
The CanaryController ties everything together. At each interval, it
samples the error rate metrics from Prometheus, and makes a decision
whether to roll forward or roll back. If it decides to roll forward,
it modifies the traffic split fraction in the Trigger specification to
increase the traffic to the new version, and the process repeats at
the next interval. If at any point a failure threshold is reached,
the controller immediately sets the new version’s traffic fraction to
zero, and declares the canary status to have failed.
Automated Canary Deployments in Fission give you the ability to move
changes into production with lower risk and increased confidence.
Check out the Fission installation guide to get started with
Fission. Join us on the Fission Slack to
chat, or follow us on Twitter at @fissionio.