API Talent is called upon to develop solutions to solve a wide variety of customer problems. Sometimes the solutions involve building a system or product, and sometimes the solutions involve creation of robust underpinning infrastructure to support or deliver that system or product.
Earlier this year, we took on a project to build a two stage continuous delivery pipeline for a small startup here in Wellington, NZ. The requirements were relatively simple: the client had a desire to adopt a DevOps culture within the workplace, and wanted to harness the power of AWS to provide the infrastructure to support that mode of operation. Specifically, the pipeline was required to be low-touch, and should build, test and deploy updated docker container images into a test environment, then subsequently into a production environment, on manual approval by product owner.
This was a great opportunity. One of the key concepts of a DevOps world is collaboration between teams, and at API Talent, this is something we promote through all our engagements. We were given mostly free-reign in the selection of technologies to use, but ensured we conferred with the development team and client business owners in order to find the sweet spot between excellence in tool functionality versus user familiarity.
One of the key concepts of a DevOps world is collaboration between teams.Dom Driver
The development team was already using git for source control. This dovetailed nicely with our advocation the use of AWS Code* services as the backbone of the solution – CodeCommit, CodeBuild and CodePipeline – since the CodeCommit service has a native git interface. Existing tools used by the devs would also be compatible with the CodeCommit repo. Builds were straightforward – AWS CodeBuild projects could replicate the existing build steps, and could be maintained directly by the development team, only being referenced at the appropriate point in the pipeline.
Unit-testing was evident as part of the developers existing build steps – existing JUnit tests could slot into post-build steps using the exact syntax the developers were used to. We also discussed the format of a release with the business owners as well as the developers – who could press the big red button, and who couldn’t? Key design decisions were made collaboratively, and the overall development of the structure and rules of the pipeline was a team effort.
Infrastructure as Code
CloudFormation is a great service. When infrastructure is defined in text-based template files, updates and upgrades are as simple as executing a changeset against existing stacks, often with little or no downtime during the event. Additionally, disaster recover is easy (assuming your infrastructure is stateless….) since entire environments and service configurations can be easily stood-up from bare earth situations in just a matter of minutes.
At API Talent, we aim to describe all infrastructure via CloudFormation templates where appropriate. In this case, the entire environment – VPC, load-balancers, ECS cluster instances etc. – was generated from templates. We even reused the same templates with different configurations for different environments, thereby ensuring consistency between them. We also created all auxiliary infrastructure, such as the CodePipeline itself, from templates, too.
When dealing with docker containers deployed via ECS, the deployment mechanism for new versions is to perform a CloudFormation stack update against the currently-running service, passing a new container image ID to the Task Definition.
This rolls the new version of the container into service with minimal disruption. In this way, CloudFormation is integral to the deployment step. We even took things one step further, and added the CloudFormation templates to the same CodeCommit repository that contained the docker build source. In this way, a changes to the templates which were committed to the repo would kick-off the pipeline, eventually updating the running versions.
We hit a snag early on; the CodeCommit service wasn’t available in the Sydney region at the time of implementation (it has since been released). This meant that we couldn’t house the CodePipeline or CodeBuild stages in Sydney either if we wanted a seamless interface with the source code repo. If our entire CodePipeline was elsewhere, how could we deploy into Sydney?
The final build artifacts were an updated docker image, stored in a private ECS Repository, and the CloudFormation templates required to update the services running on the ECS cluster. There was no issue with the ECS repository being in Sydney, since cross-region upload to the repo was straightforward. However, the actual deployment of the service into the environment in Sydney was less straightforward. Update of services in an ECS cluster require a cloudformation stack update. This is easy using CodePipeline’s native Cloudformation deployment action, but that standard action doesn’t handle deployment to an ECS cluster in another region.
Lambda to the Rescue
CodePipeline pipeline allows for a number of Stages, each of which can comprise a number of Actions. Thank goodness, then, for the custom Invoke pipeline Action. Rather than relying on the built-in cloudformation deploy Action, we rolled our own lambda function to perform the same task with one difference – the deployment region was passed as an execution parameter.
This allowed us to update the existing Task Definition of a Service running in Sydney from a CodePipeline running in North Virginia.
The Big Red Button
AWS CodePipeline makes use of the Simple Notification Service to deliver messages containing status information about the progress of current builds. We leveraged the service here to provide notifications that a new container image was ready for deployment, tailoring the distribution group to only those individuals with the permission to allow the deployment to proceed.
On receiving a notification, the user could sign-in to the console and review the current status of the pipeline, as well as any historical status for comparison, and hit the button to deploy into test if happy.
The Bigger, Redder Button
Once the new container image had been successfully deployed into the test environment, the pipeline was configured to wait for approval for release to production. Following best-practice, we created a totally separate AWS account to house the production environment. Access to production was only granted via cross-account roles, each of which could only be assumed by members of the team with the appropriate permissions.
On approving the deployment (the button for which is disappointingly neither big nor red in the AWS console!), a second lambda deployment function was invoked to copy the build artifacts to an S3 bucket in prod.
Using a combination of lambda execution role policy and bucket policy on the target bucket, the delivery of artifacts remained secure. Arrival of the artifacts into the bucket triggered a second pipeline in prod (all based in Sydney) which updated the Task Definitions for services running on the production ECS cluster.
The team now had an end-to-end build and deployment pipeline which required no manual intervention outside of business-level deployment approval.
Deployment frequency increased, and was now as fast as code could be committed and approved for roll-out. This resulted in a significantly reduced product iteration cycle, meaning greater responsiveness to customer feedback to address fixes or add features. The gated release into production – following a mandatory release and testing period in the test environment – meant that every single version of the application running on the production cluster was not only guaranteed to have been previously deployed into test, but had also been manually approved by a user with appropriate permissions.
Collaboration didn’t stop with delivery. We were also very pleased that the client also signed-up to a flavour of our Next-Generation Managed Services, meaning that the API Talent operations team would keep an eye on day-to-day operations and be on-hand for questions and queries. This too worked well, and we’ve monitored the most recent production deployments remotely.
Collaboration didn’t stop with delivery.Dom Driver
Job done! With requirements met, the client took the solution and ran with it, extending the number of services managed by the pipeline beyond the original design. Overall, the project was a great success, and an excellent feather in the API Talent DevOps hat.