The Road To Infrastructure as Code: Part 1

Why we chose CloudFormation & Troposphere to manage our infrastructure

Avi Friedman
Innovid

--

I bet the robots knew how to do it. image from: giphy.com

For years now at Innovid, a leading video marketing platform where we serve 1.3 million hours of video per day, we’ve been using Chef to manage our AWS EC2 instances. Managing our machines this way is an essential part of being able to manage and configure the infrastructure for all of our different services, essentially describing our entire EC2 infrastructure as code. Writing your infrastructure as code introduces all the benefits of code into your infrastructure stack (for us: our servers), including:

  • Version control
  • A single-source-of-truth
  • Testing
  • Automating deployment pipelines
  • Scalability

There’s really no other way to do it in a company that handles a lot of resources. However, Any infrastructure outside of our EC2 instance was created, tweaked and managed manually. This included stuff like load-balancers, security groups, networking etc. We’ve managed to pull it off quite well because although server configuration might change more often (instances created and stopped, different packages, modules and plugins being updated, added or removed), the underlying infrastructure hasn’t changed that often.

Lately, we started a big shift, migrating our entire classic-EC2 infrastructure into AWS VPC. This was a golden opportunity to create our entire infrastructure in a better way: the IAC way. The benefits are pretty self evident, as we’ve already known from managing our instances this way. Having our entire infrastructure handled the same way could give us all the benefits we’re getting from IAC for our entire infrastructure. Most importantly, for a company that feels just the right size between a corporate and a garage startup, managing our infrastructure this way would give our developers more freedom and control over the infrastructure they need for their modules and would alleviate the pressure on our DevOps ninjas.

When faced with the task of choosing an IAC tool, there are many solutions out there, including:

We looked for a tool that would be:

  • Powerful & versatile (to support as much as possible the available configuration on our cloud provider).
  • Easy to implement & manage.
  • Well-documented, supported & stable.

Chef was ruled out off the bat. While it’s surely all of the above, and while we already use it extensively (as aforementioned) to manage our instances, so we have the knowledge, experience and an up-and-running chef server cluster, Chef isn’t that good yet for managing all of our AWS underlying infrastructure as-is, but more as a tool to manage the infrastructure of our different roles. We weren’t keen on using it to define and manage all of the shared infrastructure between them.
Eventually it came down to Terraform vs. Amazon’s CloudFormation. That was a harder choice to make.

Terraform

Terraform is a tool created and maintained by HashiCorp. It’s an open source tool that allows you to write your infrastructure across cloud service providers in HCL (HashiCorp Configuration Language), for example:

provider "aws" {
access_key = "${var.access_key}"
secret_key = "${var.secret_key}"
region = "${var.aws_region}"
}
resource "aws_vpc" "default" {
cidr_block = "172.30.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
instance_tenancy = "default"
}

Currently the last stable version is 0.9.5, so it’s even pre 1.0.0. Though there’s some documentation online on the Terraform page, and some blogs describing people’s experience with Terraform, it’s still not that mature.

CloudFormation

In CloudFormation, the “code” is managed using json templates (YAML templates also hold). It’s a proprietary tool created and managed by Amazon specifically for use in AWS. It’s as closest as you can get to support all of AWS resources configuration, very versatile, documentation is aplenty, it’s stable (give or take some bugs, but we’ll dive into that in later posts), reliable, and as clients of Amazon’s enterprise support plan we get all the support we need.

Now, I know what you’re thinking…

That’s where Troposphere comes into play. Troposphere is a Python library for creating CloudFormation as Python code, that can be rendered into the appropriate json template. This means defining and managing all of our templates is as easy as writing Python code, with all the associated power that comes with it. The above code in Terraform would translate into:

from troposphere.ec2 import VPCtemplate = Template()template.add_resource(
VPC(
'default',
CidrBlock='172.30.0.0/16',
EnableDnsHostnames=True,
EnableDnsSupport=True
))
template.to_json()

One could argue this is uglier than HCL, But it’s a simple example. As you start to require more advanced stuff, you get really tired of HCL… It would have been much better if we could have used a combination of code and DSL, like in Chef with Ruby.

Terraform Vs. CloudFormation

Powerful & versatile:

Terraform is written in declarative code. While it does have some advanced abilities, like using interpolation syntax, variables and functions, it’s still limited. You can’t read complex configuration files, iterate over arrays or objects, or even do something as simple as using a variable to define a resource’s name dynamically. This can all change of course, but you’re still bound to HCL, as opposed to Python using Troposphere.

Terraform seems to have adequate support for everything you’ll want to achieve in AWS, but it’s still in pre 1.0.0 version (started in 2014), while Troposphere is currently at v. 1.9.3 (started in 2013), and CloudFormation itself dates back to 2010, and is probably the the most accurate and up-to-date representation of your AWS configuration. This is of course at the price of using a proprietary AWS tool, meaning you’re increasing your dependency in one cloud provider. However, using Terraform still means writing twice the code for every cloud provider (unless we’re speaking about running on different cloud providers at the same time, in that case Terraform would be getting more points on this). Plus, while there’s no easy way to directly convert CF templates to use with Terraform, or to export Terraform as CF templates, switching from CF to Terraform would actually be easier than the other way around. Tools like terraforming gives you the ability to import all of your infrastructure as both state files and configuration files. When we tried using it, it worked pretty well. On the other hand, choosing to migrate from Terraform to CloudFormation? That’s a nightmare.

AWS gives you the ability to export your existing infrastructure as a CloudFormation json template. It provides you with a tool called CloudFormer. This tool is, why not, given as a CloudFormation template (I’m sure there’s a recursion joke hiding here somewhere, but recursion jokes are rarely worth the digging). You use this template to create a stack of resources for a web app that is the CloudFormer tool, which you can then use (so this whole thing costs you money). This tool is still in beta, and has been since it launched way back in 2010. It wasn’t yet supported in the Ohio region (a fairly new region at the time we tinkered with it), meaning you can’t launch it in the Ohio region, or export any resources you have there (again, at least at the time). After scanning all your resources in one region, and selecting all the resources you want (a painstaking procedure), you’re left with a huge json CF template. Then, if you want to use Troposphere, you can use the Troposphere tool cfn2py to convert this to a huge python code, which you’ll then have to tweak (since it’s not a perfect tool, to say the least)… So basically:

Easy to implement and manage:

Both Terraform and CloudSphere (CF+Troposphere) are basically client based, meaning you don’t need to worry about and maintain any backend servers. In Terraform your state is saved in a .tfstate file. Though you can make use of a backend to save your state (be it an open source like Consul or the HashiCorp enterprise solution) you really don’t have to, and can just manage it anywhere you’d like (like on S3 for example). In CloudFormation, AWS actually provides the backend for saving your state.

Managing your infrastructure changes is easy with both tools. On Terraform you use the terraform plan and terraform apply commands to get an execution plan to review all changes that will be made to the current state and then apply this plan. Some blogs out there do report on some issues though (for example some latency between regions which can cause the procedure to fail).

CloudFormation supports the use of change sets, so you can create a change set, review it and execute it (using the aws cli tool). This method has its issues, which we will address in a later post, but basically you can manage your infrastructure pretty conveniently with both tools.

Documentation & Support

As mentioned, CF documentation is great. Every feature is documented and explained, there are examples for everything, and even if you don’t have enterprise support, your issue is probably already somewhere out there in a forum. As far as Troposphere is concerned, though the documentation itself is a bit lacking, everything is pretty much out there in the Python code. If you want to know what features are supported for something, for example a V2 load balancer, just take a look at the code:

class LoadBalancer(AWSObject):
resource_type = "AWS::ElasticLoadBalancingV2::LoadBalancer"

props = {
'LoadBalancerAttributes': ([LoadBalancerAttributes], False),
'Name': (elb_name, False),
'Scheme': (basestring, False),
'SecurityGroups': (list, False),
'Subnets': (list, True),
'Tags': (list, False),
}

Terraform’s documentation is not as good as CF, you’ll have a harder time trying to find answers to issues you’re dealing with since its use is not that widespread.

Summary

Both frameworks have their limitations and caveats, which you should be fully aware of before choosing any side. While Terraform is a great promise, it still felt too early to commit our entire production to it. With time, we get to hear more and more companies speak about their experience with it, and its hiccups. One company we’ve heard about has written a Python wrapper for Terraform. Sounds familiar…
As it grows and accumulates more experience, documentation and abilities, and as we start moving more toward immutable infrastructure, it would definitely feel like something we might experiment with in the future. Right now, given that we use AWS as our cloud provider, and the way our infrastructure is built and maintained, it made more sense to go with the more mature, well-documented and supported solution. Not to say it doesn’t have its annoying moments, and we’ll definitely talk about those in the posts ahead…
We’ll also point out that we currently still haven’t reached our goal of making our infrastructure immutable. We use CF basically as a configuration management tool, not an orchestration management tool.

On the next part of this series of posts we’ll dive into how we’ve implemented Troposphere and CloudFormation into our integration and deployment pipeline to manage our infrastructure in the same way we manage our modules and instances.

--

--