Infrastructure-as-Code (IaC)
Infrastructure as Code is the idea that we define and maintain our infrastructure using source code committed to some source control system. That source code is used by software to update and verify our configuration to what it needs to be.
Tools in this space include SaltStack, Chef, Ansible, Terraform, and Pulumi (among many others).
Configuration management vs Infrastructure management Link to heading
I think it’s important to make a distinction on the strengths of tools between Configuration management and Infrastructure management.
The first concerns itself with configuring running Virtual Machines with the software it needs to run. The second concerns itself with things like the configuration of the network, network level firewall/security-group rules, etc.
The infrastructure management provisions the virtual networks, subnets, domain names, and virtual machines. Where the configuration management configures the servers as needed.
Terraform Link to heading
The current leader in the IaC space. It was basically created because Cloudformation was very complicated and also the obvious fact that Cloudformation only works on AWS. Therefore all your other infrastructure would be left unmanaged.
Pros:
- Documentation is top notch.
- Tool is very simple to use
- Language is nice to write and very simple syntax to learn
- Most of what you do is plan to a file and then apply that plan. You could also just apply right away if you want, but you normally want to check things first
- Very low resource usage
- Terraform Cloud is free now and solves all the problems with using it on a team and possibly applying changes on top of each other as well as maintaining state information securely.
- Native support for all Terraform Providers of which there are MANY
- Not just the ones maintained by Hashicorp
Cons:
- Language is not a full programming language. Forces you in to weird constructs and you need to use “local” resources in strange ways
- Hard to manage several “projects”/“stacks”. You want to limit blast radius but reuse information from other terraform files and without a tool like Terragrunt this gets extremely difficult (or at least it was).
- The module system is way too simple and restrictive. It’s best to not modularize at first because moving things out of in-and-out of modules is a very painful process if written incorrectly.
Pulumi Link to heading
The upstart which has imported/converted much of the providers Terraform provides while also letting you use a real programming language. They also made it standard to have managed state and locking for teams as well as getting access to a console1.
Pros:
- Can use a real language you’re familiar with
- That means you can break up your infra into Functions/Classes/Modules/Libraries and etc you’re already used to for abstracting things.
- With Typescript you get type information which helps editors like VS Code intelligently help you write your infrastructure code
- Also with TS the types prevent things like specifying numbers for strings or typo’ing an instance type because of the literal types
- Lots of work done on Crosswalk and other pulumi specific packages to reduce complexity of creating resources which have a specific way they should be configured.
- e.g. A VPC should have
n
subnets splitting up the IP space and some of these are public, private + NAT, and isolated. You just specify what you need and it’s created correctly for you.
- e.g. A VPC should have
- The console is extremely well written and very nice
- The CLI experience is much more refined. If you’ve used terraform everything you expect will be there, but named in a more friendly way.
- The integrations with Github Actions (both V1-Docker and V2-Javascript) are well done and make it super easy to setup CI/CD pipelines and code review previews.
Cons:
- The most supported language has always been Typescript. It’s nice to code in a single language for everything, but the experience with Typescript is so great it’s really the language you should be using.
- Crosswalk helps a lot, but it also encourages you to not understand what it’s doing. When everything goes great it’s awesome. When you need to do something it doesn’t let you configure directly you quickly start getting in trouble.
- The way Pulumi works (hidden from you) is by running Pulumi (Go) which your code talks to (probably TypeScript) via some-kind-of-RPC. It’s very resource intensive and hard to run in an under resourced VM.
- It’s very easy to way over abstract things in Pulumi and make deployment very slow.
- Also due to how outputs from one resource are not available right away often new users will try to await/apply everything and that messes up the previews because until the code does
new ResourceThing(...)
it doesn’t know about it. I’ve seen extremely scary previews in some projects saying it’s going to delete the ECS Fargate service for an app because it was being created in anapply
that pulled in the database credentials. - Pulumi already handles turning Outputs into values for you. Unless typescript yells at you to make it a string try to structure things so
apply
orawait
are used as little as possible.
- Also due to how outputs from one resource are not available right away often new users will try to await/apply everything and that messes up the previews because until the code does
Terraform always had some version of a console but it was enterprise only at first. ↩︎