Terraform 0.11.x --> 0.12.x

Anyone in the SRE/DevOps world is likely at least aware of Terraform, if not actively using it. While Pulumi’s ability to write IaC in Python seems lovely to me, my company uses Terraform, so HCL it is. If you aren’t familiar, I’ll give a brief overview.

Terraform is an IaC (see link above) tool. You know how installing an OS from scratch requires human input? Everything from the partitioning scheme to what services to launch requires configuration. What if you only had to write that down once, and then it could be replicated? What I’m vaguely describing is actually more of a mix of (sticking with Hashicorp) Terraform and Vagrant, but the basic idea is the same: write - in some standardized language - what you want your end result to look like, and it can then be applied over and over achieve that result. This is called declarative programming, and it’s utterly different from what most people are used to writing. Anyway, back to the topic.

Terraform 0.12 is a big change, and one that comes with some breaking changes. Chief among these is the reservation of count as a variable name. Here’s an example Terraform resource block utilizing count as both a meta-parameter (still allowed) and a variable (no longer allowed).

resource "aws_elasticache_subnet_group" "redis_subnet_group" {
  count       = "${var.count == 0 ? 0 : 1}"
  name        = "${var.name}-${var.cluster}"
  description = "redis cache subnet group generated by terraform"
  subnet_ids  = ["${split(",", var.elasticache_subnet_ids)}"]
}

This, inside a main.tf file, is creating a resource redis_subnet_group of type aws_elasticache_subnet_group that can be referenced in other Terraform files. Specifically, it’s creating ${var.count} subnet groups for a Redis cluster ${name}. By referencing redis_subnet_group in another reference or module, its outputs can be utilized without having to recreate the code.

count is a perfectly cromulent term to describe, well, a count. How many subnet groups do you want? In the above example, we’re either getting a 0 or a 1. The ternary may seem a bit silly, as you could obviously just set count = ${var.count}, but if var.count happened to be a Boolean, you may have issues. To be fair, Terraform 0.11 would silently cast True/False to 1/0, but in 0.12, that’s no longer the case, so the above would be required if your variable assignment wasn’t guaranteed to be 0/1.

count, however, now (or will soon) have other uses in Terraform. It already has use as a meta-parameter, in that you can do nifty things like creating a makeshift for loop out of it by referencing count.index, but it’s now a reserved word inside modules. Not resources, notably, but modules. Terraform has a handy upgrade tool that converts much of your HCL over to 0.12 compliance, but as Hashicorp points out, they can’t possibly know what you want var.count to be called, so they can’t upgrade that - the onus falls on humans. Your decision is of little importance; you can go with var.cnt if you aren’t afraid of being one letter away from an HR issue, var.foo if you think self-commenting code is for the weak, or anything else you’d like. I personally went with var.num.

How, though, do you change this? Any text editor will do, of course; you need to change any instances of var.count to var.num, as well as changing count inside modules - and only modules - to num (to be fair, you could also have this be different from your variable name). find $YOUR_DIR -name "*.tf" -exec sed -i s/var.count/var.num/g {} \;takes care of the former, albeit with some edge cases, but how do you fix the latter? Search and replace won’t work, you’ll change resource count. Manually? We have 460 main.tf files, so no thanks. Regexes to the rescue (more accurately, this). Well, regexes and Python.

count_regex = r"(^module.+\n+)([\S\s].+)?(\s+(?=count))(count)"

# Group 1: Lines beginning with "module", followed by 1+ of any character except
#          a linebreak, followed by 1+ linebreak
# Group 2: 1+ of any character
# Group 3: 1+ of a whitespace character with a positive lookahead of "count"
# Group 4: The word "count"
# Example:
#
# module "security-groups" {                                      <-- Group 1
#     source = "./path"                                           <-- Group 2
#     literally_just_gibberish                                    <-- Group 2
#     count  = "${var.count}"                              <-- Groups 3 and 4
# }

Cool, that takes care of our module issue. This could have actually been the end of it, but I wanted more functionality. Some of our main.tf files had inline variables, and I wanted all of them in their own dedicated variables.tf file. Regex match any line beginning with “variable,” right and yank it out to a new file, right? But wait, what if a variable has a default assignment, like so?

variable "count" {
  default = "1"
}

Or worse, what if it has some complicated logic spanning multiple lines? What then? Well, I created this monstrosity, which in my defense does work much of the time (for my company’s infrastructure):

var_regex = r"((?!^variable)[\s\S])*(^variable[\s\S]+)({}$|{[\n\s].+[\n\s]}$)([\s\S]*)"

# Group 1: 0+ of any character, with a negative lookahead of the word "variable" at a line's start
#          This is used to discard anything prior to a variable block
# Group 2: The word "variable" at a line's start, followed by 1+ of any character
# Group 3: Either the string "{}" or "{" followed by 1+ a linebreak or whitespace character, followed by
#          1+ of any character except linebreaks, followed by a linebreak or whitespace, followed by
#          "}" at a line's end
# Group 4: 1+ of any character

# Example:
# provider "aws" {                                                                                          <-- Group 1
#   alias = "accepter"                                                                                      <-- Group 1
# }                                                                                                         <-- Group 1
#                                                                                                           <-- Group 1
# variable "name" {}                                                                                        <-- Group 2
# variable "count" {}                                                                                       <-- Group 2
# variable "long_string" {                                                                           <-- Groups 2 and 3
#     default = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod                    <-- Group 3
#                tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,               <-- Group 3
#                quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat..."      <-- Group 3
# }                                                                                                         <-- Group 3

# data "aws_route_tables" "requester" {                                                                     <-- Group 4
#    provider = "aws.requester"                                                                             <-- Group 4
#    vpc_id = "${var.requester_vpc_id}"                                                                     <-- Group 4
# }

I’m not sure if the comment explanation is correct, as I was going through many iterations of the regex, and probably didn’t always stop to document my changes. The whole thing is commented out for now anyway, as I hit upon a hackier but effective solution. Damn edge cases.

ignore = ["data", "locals", "module", "output", "provider", "resource"]
def strip_vars(f: typing.TextIO) -> list:
    tmp_lst = []
    i = 0
    f.seek(0)
    try:
        for line in f.readlines():
            tmp_lst.append(line)
            if line.startswith(tuple(ignore)):
                # This assumes your .tf files have a contiguous block of variables
                tmp_lst.pop()
                break
        for line in "".join(tmp_lst).splitlines():
            if line.startswith("variable"):
                break
            while not line.startswith("variable"):
                i += 1
                break
        del tmp_lst[:i]
        # Remove trailing newline if it got pulled in
        if "".join(tmp_lst[-1:]).isspace():
            del tmp_lst[-1:]
        # And add a newline to the head for any existing entries
        if not "".join(tmp_lst[0]).isspace():
            tmp_lst.insert(0, "\n")
    except IndexError:
        print("INFO: No variables found in " + f.name)
    return tmp_lst

I’m not saying this is the best way to go about this, nor that it’s the most performant (it’s actually very fast; turns out text parsing doesn’t take much time), but it works. The function, given a file, goes through each line, starting at the beginning. It appends each line as a string in a list. If any of the lines start with a word in the ignore list, it pops that line out, and breaks to the next line. Now that I think about it, that break is redundant here since the loop terminates after it, but at the time there was more after. Anyway, when done, tmp_lst contains each line of the file, less any starting with our ignored words. Next, similar logic is performed, with an index. If a line starts with the word “variable,” skip to the next line. Otherwise, increment the counter. After this for loop terminates, delete everything in the list after we don’t have variables. Finally, do some cleanup and return. Assuming your Terraform is written such that your variables are all in one contiguous chunk, the end result is a list containing the variables, and only the variables. This is then used in two other functions - one to write them to a variables.tf file, and one to remove them from the file they were extracted from (usually main.tf). Oh, unless there weren’t any variables in the file, in which case print an INFO message out for the user and move on with life.

def remove_vars_from_main(f: typing.TextIO, var_lst: list) -> None:
    # With variables written to a new file, remove them from the main file
    with open(f.name) as tmp_file:
        tf_file = tmp_file.read()
    if "".join(var_lst) in tf_file:
        tf_file = tf_file.replace("".join(tf_vars), "")
        f.seek(0)
        # Trim leading whitespace if it exists
        if tf_file[0] == "\n":
            tf_file = tf_file[1:]
        f.write(tf_file)
        f.truncate()
    else:
        print("ERROR: Unable to move variables out of " + f.name)

Here, we check if the vars string is contained within the file we extracted it from (I mean, it should be, but never assume anything), and if so, delete it, trim any leading whitespace, and overwrite the file. Give the user a friendly warning if this fails. I saw this error popping up when there was weird whitespace existing in the file already, but after adding the trim lines I haven’t seen it crop up. I suppose it could be in a while loop in case someone was extra-judicious with their whitespace additions. Note, all function calls are wrapped in a try/except catching IOError so any write permission errors are caught separately.

That’s basically it. I also wrote some Bash scripts to run Terraform’s 0.12upgrade tool on everything, but they’re nothing special. The first iteration (and actually, current as of this writing - I intend to fix it tomorrow) didn’t even recurse into subdirectories, whoops. I’ve made everything available on Github, and I hope someone else can make use of it. Also included are the aforementioned shell scripts. One of them runs the upgrade tool, and touches a file named terraform_0.12_ready in each directory where it succeeds - this may be useful for you to find problematic files, as the output may get hidden with everything else. Something like find $YOUR_DIR -type d '!' -exec test -e "{}/terraform_0.12_ready" ';' -print would suffice. The other simply removes those files. It checks for both the name and for the size to be 0, lest you for whatever reason have a bevy of files named terraform_0.12_ready.