As you may or may not be aware the team at Red Badger has been hard at work crafting away at the new Fortnum & Mason e-commerce website. It has also recently been nominated for the best customer experience award at the BT tech & ecom awards (https://techecommawards.retail-week.com/shortlist-2015). We have delivered the site from concept to live in just 8 months using agile and lean methods such as Kanban. One of the core concepts that compliments our Kanban approach when delivering features on the project is the ability to deploy without friction, confidently and multiple times a day.
Lets break this down and take a look from a high level what the process looks like.
You might recall I blogged about how we used GitFlow (http://red-badger.com/blog/2013/08/15/sprint-efficiently-with-github/) within our development team. In Fortnum & Mason and other projects we have recently moved to GitHub Flow (https://guides.github.com/introduction/flow/) mainly due to the recent supporting features implemented by GitHub.
The core principle is that the team pair programs on a feature branch. A pull request is then created with relevant specs, it then gets reviewed and collaborated on by the team. Once the feature branch runs all of the tests via CircleCI it can then be merged into master. Our master branch reflects production code always. We use CircleCI which executes our Ansible scripts for provisioning and deployment.
In Fortnum & Mason we have unit tests along with golden path journeys using Capybara that run using Chrome driver. These golden path specs are the core journeys that reflect if the site is transactional. Once all of the specs have passed the master branch is then deployed to our staging environment immediately ready for our QA to test.
If our QA is happy with the build then the QA will take ownership and tag a release via GitHub releases (https://github.com/blog/1547-release-your-software) stating what issues have been fixed as well as any new features added. The last part is to fill the release tag name with a semantic version (http://semver.org/) number. This gives us fantastic rolling documentation that at a glance everyone can see what changes have taken place. We are transparent with our clients as much as we can be and our product owner has access so can also take a look and really get a feel of what work has been completed day to day or even minute by minute.
Another tool widely used across Red Badger is Slack for company wide collaboration and communication. For this project we decided to setup hubot (https://hubot.github.com/), an automated bot that (mostly) obeys your commands. We added a couple of custom scripts that allows the QA or any of the team to deploy a release as and when necessary. It is as simple as a message @badgerbot fm list tags. Which lists the 5 latest tags in our repository. Once you have the tag you want you can deploy it using @badgerbot fm deploy v1.0.0. This causes a parameterised build (https://circleci.com/docs/parameterized-builds) within CircleCI to run the relevant Ansible scripts using the tag specified which then deploys into the production environment.
Our deployments already come with a high degree of confidence due to the development practices of pair programming, code review, specs and QA tested features / issues. But if something does go wrong in production we are safe in the knowledge that we will know about it immediately, how do we know… well in comes New Relic and a quick Red Badger service we hacked together in a few hours. The Fortnum & Mason site is rigged up all over with New Relic alerts and events throughout its codebase. Every instance, moving part and third party call has its instrumentation and performance tracked. CircleCI even tracks each deployment so we can quickly see any performance degrading for every deployment that goes out.
Another element we have in Fortnum & Mason is the ability to flip features on and off using a concept called feature flipping. This allows us to incrementally release larger features to select users, we can then ensure that we are confident it works as the code runs side by side against deployed production code. A good example is adding another payment provider such as PayPal, we can test run it in production with a few users to make sure everything integrates before switching it on to everyone. We can have fine grained control and can release it to the product owner, groups of users or even a random percentage of users.
This really helps the teams principle of always moving forward.
Here is a breakdown of the monitoring and alerting services we use and what for:
New Relic Application Performance Monitoring (APM)
Instrumentation, performance and error logging. Every server and third party performance call is logged and alerting is setup to inform of us of any bottlenecks and errors.
New Relic Synthetics
Continuous golden path testing to ensure the the site is always testing transactional flows. Selenium scripts using Chrome that run every 15 minutes to ensure core journeys on the site are operational.
New Relic Insights
Customer behaviour analytics as well as KPI. We log everything from delivery methods, revenue, average basket sizes and much more allowing us to analyse and test new assumptions to improve customer experience.
Red Badger Phone Alerting
Although not part of New Relic we hacked a service that accepts hooks from ZenDesk and New Relic. If any critical part of our monitoring raises a critical alert or email the service uses the Twilio API to phone a badger who is on 24/7 support.
Early and often
With all this in place we can be confident that fixing issues and deploying new features multiple times a day is second nature to the whole team. Deployments are not blocked by hefty deadlines and big ‘release planning’ becomes a thing of the past. Just a quick review of the tagged release in Github by the team is all that is required. The other huge added benefit when deploying early and often is minimising risk of every deployment with ultimately smaller incremental changes.
So next time you are ‘release planning’ ask yourself how confident, efficient and easy is it for you and your team to deploy multiple times a day.