The scene: I was going back to a set of 18-month-old Packer files to add set -eux -o pipefail to each file in the build. (If you’re not familiar with this command and its uses, here’s where I learned about it. Highly recommended.) I’d recently had a two-day time sink, wherein I couldn’t get LDAP access to work on our CI/CD, and eventually I found that the shell script that adds our LDAP certs had coughed and died midway through without Packer erroring out and letting me know that something was wrong. LDAP failure is a pretty common sign that something is wrong with our CI/CD, but in the past it’s been due to more exotic problems than Packer petering out. Pipefail isn’t necessarily the right tool for every job, but I wanted to spare my future self these issues, where VERY SIGNIFICANT PROBLEMS might otherwise be buried in a billion screens of Packer output.
(Yes, I’ll still look at the credentials folder first next time.)
That was how I learned that Packer scripts can fail, but the build can still complete. This surprised me, considering how many failed builds I experienced when I was first working with Packer. So now I’m working through each script, finding quiet problems (such as unnecessary symbolic links being created during the installation of our version of Java) and other issues that perhaps aren’t problems today but may arise like the kraken later to take its accumulated revenge. Like I said, these scripts have been in use for about a year and a half, building AMIs at least once a month. Usually, only the base AMI changes, and the only other alterations have been additions – this version of Ruby that one dev team needs, this package for another group. Beyond that, it’s been pretty steady, which means a fair amount of time has passed since any kind of in-depth review of these files.
Pipefail is a great and rather educational way to work through your scripts, but on a recent day of this little side project, I encountered a surprising problem. In one of the scripts, PATH is augmented, followed by source /etc/bashrc. This is when the file errored out, with a gasp of amazon-ebs: /etc/bashrc: line 12: PS1: unbound variable.
What in the what?
I did some googling for this unbound variable business, but the results didn’t apply to what I was doing. I wasn’t failing to create $youMessedUp. /etc/bashrc did indeed exist in the Packer Build instance, which I confirmed by, variously, touch /etc/bashrc, ls -a /etc | grep bashrc, and cat /etc/bashrc, at various times in my troubleshooting. The source command was being used correctly. And there were exactly no variables in that script.
Huh.
But /etc/bashrc was a robust file, quite lengthy compared to the most familiar file of its type in my life, the ~/.bashrc on my own machine. There was a lot going on in there… including variables. And because of the kinds of AMIs I use on this project – that is, AMIs built by a different team I have little contact with, issued every month without exhausting notes on what might have changed from the last version – any alteration I might have made that day might be useless or, worse, damaging when applied blindly next month.
Shit.
Beyond that, there was the issue of scope. This pipefail project was supposed to be about controlling my end of things. Faulty machine images and limited control are just part of my job. I’ve dealt with said images, but the dealing is not typically dont in shell scripts. Usually, if it’s something especially sticky, the job becomes one of communication, wherein I document what’s up and reach out to the agency in charge of regular base AMI creation so we can sort things out.
So that resolution and realization was where set +u came in.
I have an ongoing concern that shortcuts that I think are efficient might be unhelpful cheating, especially in this particular phase of my career. I ran my error and my situation by a few more senior engineers at my job. The idea of set +u came up. And said seniors confirmed that this was just wise and not laziness.
So:
set +u
source /etc/bashrc
set -u
That is, repeating the command at the top of the page. +u reverses that initial -u flag, which ends the script when an unbound variable happens. For that one line, only set -ex -o pipefail is in play, minus that situationally unfortunate -u.
This is useful if you have a weird situation like mine, where you need to run bash strict mode most of the time but have a line or a section of a script that deals with a resource that’s out of your control (but which you can still trust). Other times this is useful is if you’re activating a virtualenv in Python. In that case, set -u may be best set aside for that particular endeavor. In short, if your script is opening a big bucket of things out of your control (/etc/bashrc, the contents of an /env/bin/activate folder), and you want to go full set -eux -o pipefail otherwise, pop a little set +u in there.
But, this specific little situation aside, I’ve become a convert for set -eux -o pipefail on my Packer builds for sure and will probably keep the habit when I’m in a situation where I’m using AMIs not made by an outside team. The more you know, right?