Diana Initiative 2021: The System Call Is Coming from Inside the House

Like ghosts, security vulnerabilities are the result of us going about our lives, doing the best we can, and then experiencing things going awry just because that’s usually what happens. We plan systems, we build software, we try to create the best teams we can, and yet there will always be echoes in the world from our suboptimal choices. They can come from lots of places: one team member’s work not being double checked when a large decision is at stake, a committee losing the thread, or – worst of all – absolutely nothing out of the ordinary. Alas, it’s only true: security issues are a natural side effect of creating and using technology.

In this post (and the talk it accompanies; video to come when it’s up), we’re going to talk about the energy signatures, cryptids, orbs, poltergeists, strange sounds, code smells, and other things that go bump in the night. That’s right: we’re talking about security horror stories.

Before we get started, here’s a little disclaimer. Everything I’m about to talk about it is something I’ve encountered, but I’ve put a light veil of fiction on everything. It’s just professionalism and good manners, a dash of fear of NDAs, and a little superstition too. The universe likes to punish arrogance, so I’m not pretending I’m immune to any of this. No one is – it’s why appsec exists!

Now: let’s get to the haunts.

Automatic Updates

I’m starting with one that’s probably fresh in everyone’s minds, and I’m mentioning it first because it’s such a dramatic betrayal. You think you’re doing the right thing. You’re doing the thing your security team TOLD you to do by keeping your software up to date. And then suddenly, you hear a familiar name in the news, and you have an incident on your hands. 

a white man in a blue shirt who has been stabbed by his own wooden stake. A dialogue bubble reads, "Mr. Pointy, no!"

This one is like getting stabbed with your own stake when you’re fending off a vampire; I thought we had a good thing going!

A drawing of a newspaper that says "OH NO" and then "YIKES YIKES YIKES"

Ideally, you’ll learn about it via responsible disclosure from the company. Realistically, it might be CNN. Such is the way of supply chain attacks.

You invite it in by doing the right thing. Using components, tools, and libraries with known vulnerabilities is in the most current version of the OWASP top ten for a reason, so there’s endless incentive to keep your stuff up to date. The problem comes from the tricky balance of updates vs. stability: the old ops vs. security argument.

A couple of jobs ago, we were suddenly having trouble after our hourly server update. Ansible was behaving itself, but a couple of our server types weren’t working right. After an intense afternoon, some of my coworkers narrowed it down to a dependency that had been updated after a long time with no new releases. It wasn’t even what you’d call a load-bearing library, but it was important enough that there’d been a breaking change. The problem was solved by pinning the requirement to an earlier version and reverting the update.

My own sad little contribution toward this not being the state of things forever was to make a ticket in our deep, frozen, iceboxy backlog saying to revisit it in six months. I was an SRE then, but I was leaning toward security, and I was a little perturbed by the resolution – though I couldn’t have suggested a better solution for keeping business going that day.

(It did get fixed later, by the way. My coworkers reached out to tell me, which was very kind of them.)

This has become one of those stories that stays with me when I review code or a web UI and find that an old version of server software or a dependency has a known issue. Even if it doesn’t have a documented vulnerability, not throwing the parking brake on until it’s updated to the most recent major version feels like doing badly by our future selves, even if it’s what makes sense that day. 

The best way to fix this is to pay attention to changes, check hashes on new packages, and generally pay close attention to what’s flowing into your environment. It isn’t easy, but it’s what we’ve got.

Chrome extensions and other free software

The horrors that lurk in free software are akin to the risks of bringing a Ouija board in the house that you found on the street

Behold: my favorite piece of art I made for this talk/post.

a Ouija board sits on top of a pile of trash bags in a brown puddle

You can identify it by listening for you or someone around you saying, “Look at this cool free thing! Thanks, Captain Howdy, I feel so much more efficient now!” Question everything that’s free, especially if it executes on your own computer. That’s how we invite it in: sometimes it’s really hard to resist something useful and free, even if you know better.

A particular problem with Chrome extensions is that you can’t publish them without automatic updates enabled, so someone has a direct line to change code in your browser, so long as you keep it installed.

Captain Howdy Efficiency Extension with picture of Regan from The Exorcist

Last fall, The Great Suspender, a popular extension for suspending tabs and reducing Chrome’s memory usage, was taken over by malicious maintainers. As a result, people who had, once upon a time, done their due diligence were still sometimes unpleasantly surprised.

That’s the tough thing about evaluating some of these: you have to consider both the current risks (permissions used, things lurking in code) and the possible future risks. What happens if this program installed on 500 computers with access to your company’s shared drive goes rogue? It makes it difficult to be something other than that stereotype of security, the endless purveyors of NO. But a small risk across a big enough attack surface ends up being a much larger risk.

In short: it’s the Facebook principle, where if you get a service for free, you might consider the possibility that you’re the product. Security isn’t necessarily handled as carefully as it should be for the product rather than paying customers. Pick your conveniences carefully and prune them now and then. (Or get fancy, if you control it within your company, and consider an allowlist model rather than a blocklist. Make conveniences prove themselves before you bring them in the house.)

unsafe-eval and unsafe-inline in CSP

watercolor of a brown book that says "Super Safe Spells, gonna be fine!"

Our next monster: an overly permissive content security policy. I’d compare it to a particular episode of Buffy the Vampire Slayer or any movie that uses that trope of people reading an old book out loud without understanding what they’re saying.

Fortunately, a content security policy is a lot easier to read than inspecting every old leather book someone might drag into your house

watercolor of a brown book that says "script-src 'unsafe-eval'" and "Good CSP Ideas"

We invite it in because sometimes,the risky choice just seems easier. You just need to run a little code from a CDN that you need to be able to update easily. You know.

For fending it off, I would gently urge you not to put anything in your CSP that literally contains the word “unsafe.” I honestly understand that there are workarounds that can make sense in the moment, when you’re dealing with a tricky problem, when you just need a little flexibility to make the thing work.

In that case, I urge you to follow my quarantine problem-solving process, for moments where you need to figure something out or finish a task, but the brain won’t cooperate.

  1. Have you eaten recently? (If not, fix that.)
  2. Does this just feel impossible right now? Set a timer, work on it for a bit, then stop.
  3. Can you not think around this problem usefully? Write out your questions.

I would suggest starting with “why do I think unsafe-eval is the best option right now, and what might I google to persuade myself otherwise?”

Sometimes you want to keep things flexible: “I’ll just allow scripts from this wide-open CDN URL wildcard pattern.” But what can this enable? What if the CDN gets compromised? Use partners you trust, sure, but it’s also a good idea to have a failsafe in place, rather than having something in place that allows any old code from a certain subdomain.

Look, it’s hard to make things work. You have to be a glutton for punishment to do this work, even if you love it. (And I usually do.) But you can’t just say yes to everything because of future-proofing or whatever feels like a good reason today, influenced by your last month or so of work pains. You can’t do it. I’m telling you, as an internet professional, not to do it. Because your work might go through my team, and I will say NOT TODAY, INTERNET SATAN, and then you’re back at square one anyway.

Stacktraces that tell on you

“The demon is a liar. He will lie to confuse us; but he will also mix lies with the truth to attack us.”

William Peter Blatty, eminent appsec engineer and author of The Exorcist

Logs and stacktraces contain the truth. Right? They’re there to collect context and provide you information to make things better. Logs: they’re great! Except…

Exception in thread "oh_no" java.lang.RuntimeException: AHHHHHH!!!
    at com.theentirecompany.module.MyProject.superProprietaryMethod(MyActualLivelihood.java:50)
    at com.theentirecompany.module.MyProject.catOutOfBagMethod(MyActualLivelihood.java:34)
    at com.theentirecompany.module.MyProject.underNDAMethod(MyActualLivelihood.java:27)
    at com.theentirecompany.module.MyProject.sensitiveSecretMethod(MyActualLivelihood.java:11)
    at com.theentirecompany.module.MyProject.oh_no(MyActualLivelihood.java:6)

…except for when they get out of their cage. Or when they contain information that can be used to hurt you or your users.

We invite it in by adding values to debug statements that don’t need to be there. Or maybe by writing endpoints so that errors might spill big old stacktraces that tell on you. Maybe you space and leave debugging mode on anywhere once you’ve deployed. Or you just haven’t read up OWASP’s cheat sheet on security misconfiguration.

How to fend it off: conjure a good culturally shared sense of safe log construction and remember what shouldn’t be in logs or errors:

  • Secrets
  • PII
  • PHI
  • Anything that could hurt your users
  • Anything that could get you sued.

Make a commit hook that checks for debug anything.

Secrets in code

“To speak the name is to control the thing.”

Ursula K. Le Guin

The monster I’d compare this to is when the fae (or a wizard, depending on your taste in fantasy) has your true name and can control you.

API_KEY = "86fbd8bf-deadbeef-ae69-01b26ddb4b22"

How to identify it: read your code. Is there something in there you wouldn’t want any internet random to see? Yank it! Use less-sensitive identifiers like internal account numbers if you need to, just not the thing that lets someone pretend to be you of one of your users if they have it.

You know you’re summoning this one if you find yourself saying, “Oh, it’s a private repo, it’ll never matter.” Or maybe, “It’s a key that’s not a big deal, it’s fine if it’s out there.”

We fend this one off with safe secret storage and not depending on GitHub as a critical layer of security.

We all want to do it. Sometimes, it’s way easier than environment variables or figuring out a legit secret storage system. Who wants to spend time caring for and feeding a Vault cluster? Come on, it’s just on our servers, it’s not a big deal. It’s only deployed within our private network.

It is a big deal. It is my job to tell people not to do this. Today I will tell you for free: don’t do this!

I’ve argued with people about this one sometimes. It surprises me when it happens, but the thing is that people really just want to get their job done, which can mean the temptation of doing things in the way that initially seems simpler. As a security professional, it becomes your job to help them understand that saving time now can create so any time-consuming problems later.

It’s great to have a commit check that looks for secrets, but even better is never ever doing that. Best of all is both!

A single layer of input validation

Our next monster: a single layer of input validation for your web UI. I quote Zombieland: always double-tap.

many snaky hydra heads baring pointy teeth

Our comparison for this one is anything tenacious. Let’s say zombies, vampires, hydras. Anything that travels in a pack. And we identify it by siccing Burp Suite on it. Maybe the form doesn’t let you put a li’l <script> tag in, but the HTTP request might be only too happy to oblige. We invite it in by getting a little complacent.

The best way to fend it off is to remember that regular users aren’t your real threat (and you’re probably just irritating people with names that don’t meet the definition of “normal” some shoddier validation will catch, which can do some awful racist things). There’s a reason injection, especially SQL injection, is a perennial member of the OWASP top ten.

Say it with me: client-side and server-side validation and sanitation! And then add appropriate encoding too, depending on what you’re doing with this input.
Most people do only interact with your server via your front end, and bless them. But the world is filled with jerks like me, both professional jerks and people who are jerks for fun, and we will bypass your front end to send requests directly to your server. Never take only one measure when you can take two.

Your big-mouthed server

curl -I big-mouthed-server.com

How to identify this one? curl -I thyself

We invite this one in by not changing defaults, which aren’t always helpful. This also relates to security misconfiguration from the OWASP top ten. The early internet sometimes defaulted too much toward trust (read The Cuckoo’s Egg by Cliff Stoll for a great demonstration of the practices and perils of this mindset), and we can still see this in defaults for some software configs.

Protect yourself from this one by changing your server defaults so your server isn’t saying anything you wouldn’t want to broadcast. The primary concern here is about telling someone you’re using a vulnerable version of your server software, so keep your stuff updated and muffle your server.

Let’s go X-Files about it: trust no one, including your computers.

ESPECIALLY your computers.

Don’t trust computers! They just puke your details out everywhere if someone sends them a special combination of a few characters.

In conclusion, sometimes I wish I had been born in the neolithic.

Laissez-faire auth (or yolosec, if you prefer)

Our next monster: authentication that accepts any old key and authorization that assumes that, if you’re logged in, you’re good to go. Its horror movie counterpart is that member of your zombie-hunting group that cares more about being a chill dude than being alive.

You can identify it by doing a little token swapping. Can you complete operations for one user using another’s token? The monster is in your house.

Here we are again at the OWASP top ten: broken authentication is a common and very serious issue. The problem compounds because we need to do better than “Are you allowed in?” We also need to ask, “Are you allowed to do THIS operation?”

We invite this one in by assuming no one will check to see if this is an issue or by assuming the only people who interact with your software have a user mindset. Or, sometimes worst of all, just not giving your devs enough time and resources to do this right.

We can fend it off by using a trusted authentication solution (OAuth is popular for a reason) and ensuring there are permissions checks, especially on state-changing operations – ones that go beyond “if you’re not an admin, you don’t have certain links in your view of things.”

I’ve seen a fair amount of “any token will do” stuff: tokens that don’t invalidate on logout, tokens that work if you fudge a few characters, things like that. It’s like giving a red paper square as a movie ticket: anyone can come in anytime. Our systems and users need us to do better.

The elder gods of technology

Ah, yes, the technology that will never leave us. Think ancient WordPress, old versions of Windows running on an untold number of servers, a certain outdated version of Gunicorn, and other software with published CVEs.

“That is not dead which can eternal lie.”

Which monster would I compare this to? Well, I’m not going to say his name, am I?

You know you’re at risk of summoning it when you hear someone say “legacy software,” though newer projects are NOT immune to this. Saying “we’re really focusing our budget on new features right now” is another great way to find… you know… on your doorstep.

We can fend it off by allocating budgets to review and update dependencies and fix problems when they’re found. And they should be sought out on a regular schedule.

No tool, framework, or language is bad. They all have potential issues that need to be considered when. For instance, there are still valid reasons to use XML, and PHP is a legitimate language that just needs to be lovingly tended.

Yep, we’re back at the risk of using components with known vulnerabilities from the OWASP top ten. No tool, framework, or language is bad, but some versions have known problems, and some tools have common issues if you don’t work around them. It’s not on them; it’s on you and how you use and tend your tools.

The real incantation to keep this one away is understanding why we keep these things around. No engineering team keeps their resident technical debt nightmare because they like it. They do it because it’s worked so far, and rewriting things is expensive and finicky, particularly if outside entities depend on your software or services.

I’ve never been on an engineering team that didn’t have lots of other things to do rather than address the thing that mostly hasn’t given them problems… even if none of the engineers without vivid memories of the 90s understand it very well.

Sometimes the risk of breaking changes is scarier than whatever might be waiting down the road, once your old dependencies cause problems… or someone on the outside realizes what your house of cards is built on.

Javascript :D

Speaking of no tool, framework, or language being bad: let’s talk about Javascript. Specifically Javascript used incorrectly.

Our horror comparison? It’s a little 12 Monkeys, a little Cabin Fever. Anything where they realize the contagion is in the water or has gone airborne. “Oh god, it’s everywhere, it’s unavoidable, it’s a… global… pandemic…” Oh no.

The way to summon this one is simple. Say it with me:

internet

Anything… internet. It’ll be there, whether you think you invited it or not.

To fend it off? Just… be very careful, please.

I’m not going to make fun of Javascript. Once I was a child and enjoyed childish things, like making fun of Javascript. Ok, that was like three years ago, but now I am an adult and have quit my childish ways. In fact, I made myself do it somewhere between when I realized that mocking other people’s tech is not actually an interesting contribution to a conversation and when I became an appsec engineer, so roughly between 2016 and the end of 2019.

The internet and thus life and commerce as we know it runs largely on Javascript and its various frameworks. It’s just there! And there have been lots of leaps forward to make it less nightmarish, thanks to the really hard work of a lot of people, but still, things like doctor’s appointments and vaccination slots and other giant matters of safety and wellbeing hang out on top of the peculiarities of a programming language that was created in ten days.

Unfortunately, new dragons will just continue to appear, because that’s the side effect of making new things: new features, new problems. The great thing for appsec that’s an unfortunate thing about humanity is that we make the same mistakes over and over, because if something worked once, we want to think we have it sorted. And unfortunately, the internet and technology insist on continuing to evolve.

How to fix it? Invest in appsec.

Sorry. It’s a great and important goal to develop security-conscious software engineers and to talk security as early as possible in the development process. But there’s a reason writers don’t proofread their work – or shouldn’t. Writing and reviewing the work are two different jobs. It’s why healthy places don’t let people approve their own PRs. If we created it, we can’t see it objectively. And most of us become sharper when honed by other perspectives and ideas.

The most permissive of permissions

And finally, the monster that’s nearest and dearest to my heart: overly permissive roles, most specifically AWS permissions. I’d compare this to sleeping unprotected in the woods when you know very well they’re full of threats.

You can identify it by checking out your various IAM role permissions. And yes, this is totally a broken authentication issue too, a la OWASP.

The best way I know to invite this one in is by starting out as a smaller shop and then moving from a team of, say, three people who maybe even all work in the same room, doing everything, to thirty or fifty or more people, who have greater specialization… yet your permissions never got the memo.

And the best way I know to fend it off is to make more refined roles as early as you can. I know, it’s hard! It isn’t a change that enables a feature or earns more money, so it’s easy to ignore. Worst of all, as you refine the roles, reduce access, and iterate, you’re probably going to piss off a bunch of people as you work to get it right. IAM: the monster that doesn’t need a clever analogy because getting it right sucks so bad on its own sometimes!

It’s also the monster that lurks around basically every corner, because IAM is ubiquitous. So it’s everywhere, and it’s endlessly difficult: awesome! Alas, we have to put the work in, because messing it up the most efficient way I know to undermine so much other carefully done security work.

AWS permissions that allow access to all resources and actions

And yet I feel warmly toward it, because IAM was my gateway into security work. It’s the thing I tend to quietly recommend to the security-aspiring, because most people don’t seem to like doing it very much, yet the work always needs to be done. Just showing up and being like, “IAM? I’d love to!” is a highly distinctive professional posture. Get your crucifix ready and have some incantations at hand, and you’ll never run out of things to do. It’s never not useful. Sorry, it’s just like the rest of tech: if you’re willing to do the grunt stuff and get good at it, you’ll probably have work for as long as you want it.

Whew. That’s a lot of things that go bump in the night.

Let’s end with some positive things. I swear there are some. And if a security engineer tells you that there are still some beautiful, sparkly, pure things in the world, that’s probably worth listening to.

Strategies for vulnerability ghost hunting

Don’t be the security jerk

Be easy to work with. This doesn’t mean being in a perpetually good mood – I am NOT, and former coworkers will attest to this if you ask – but it means reliably not killing the messenger.

People outside of your security team – including and especially people who aren’t engineers – are your best resource for knowing what’s actually going on. Here’s the thing: if you’re a security engineer, people don’t act normally around you anymore. You probably don’t get to witness the full spectrum of people’s natural behavior. Unfortunately, it just comes with the territory. And it means we have to rely on people telling us the truth when they see something concerning.

Even people who are cooperative with security will hedge a little sometimes when dealing with us. I know this because I’ve done it. But you’ll reduce this problem if everyone knows that talking to security will be an easy, gracious thing where they’re thanked at the end. Make awards for people who are willing to bring you problems instead of creating a culture of fear and covering them up! Make people feel like they’re your ally and a valued resource, because they are!

Be an effective communicator

Being able to communicate in writing is so important in this work. Whether it’s vulnerability reports, responsible disclosure, blog posts warning others about haunted terrain, or corresponding with people affected by security poltergeists, being able to write clearly for a variety of audiences is one of our best tools.

If you think you’re losing your grip on how regular people talk about this stuff, bribe your best blunt, nontechnical friend to listen to you explain things. Then have that blunt friend tell you when what you said didn’t make a goddamn lick of sense… then revise and explain again. Do this until you’re able to explain things in plain language accessible to your users and spur them to action using motivations that make sense to them

Now let’s leave this haunted house together and greet the coming dawn.

I hope you encounter none of these terrifying creatures and phenomena that I described, that your trails from server to server in our increasingly connected world are paved with good intentions and mortared together with only the best outcomes. But should you wander into dark woods full of glowing red eyes and skittering sounds belonging to creatures just out of sight… I hope you are better equipped to recognize and banish them than you were earlier. Thank you for reading.

Day of Shecurity 2021: Intro to Command Line Awesomeness!

Recently, I hugged a water buffalo calf. It was approximately as exciting as giving this talk is, which is to say: VERY.

Updated: now with video!

First, the important stuff: here’s the repo with a big old markdown file of commands and how to use them. It also includes my talk slides, which duplicate the markdown (just with a prettier theme, in the way of these things).

Second: I went to the Lookout Security Bootcamp in 2017, one of my first forays into security things (after some WISP events in San Francisco and DEF CON in 2016). That’s where I conceived of the idea of this talk. There was a session where we used Trufflehog and other command line tools, and we concluded with a tiny CTF. Two of the three flags involved being able to parse files (with grep, among other things), and I was familiar with that from my ops work, so I won two of the three gift cards up for grabs. I used the money to buy, as I remember, cat litter, the Golang book, and a pair of sparkly Doc Martens I’d seen on a cool woman in the office that day. I still wear the hell out of the boots, including at DEF CON, and I still refer to them as my security boots.

I spent the rest of that session teaching the rad women around me the stuff I knew that let me win those challenges. This had two important effects on me. The first was that I thought, “Wait, it might be that I have something to offer security after all.” The second was that I wanted to do a session someday and teach these exact skills.

I went to Day of Shecurity in 2018 and 2019 too. It’s a fabulous event. At the last one, just a handful of months before we all went and hid in our houses for more than a year, I went to a great session on AWS security (a favorite subject of mine) by Emily Gladstone Cole. And I thought: oh, there it is. I’m ready. I told my friends that day that I wanted to come back to DoS as a presenter. And I pitched the session, it got accepted, and after a fairly dreamless year, one of mine came true.

So if you’re reading this: hello! These events really do change lives. The things you do really can bring you closer to what you want. And, as I like to say in lots of my talks, there is a place for you here if you want to be here. We need you. Keep trying.

I wrote about my own journey into security here. Feel free to ask questions, if you have them! I love talking about this, and I would like to help you get to where you want to go.

You Can Put WHAT in DNS TXT records?! A blog post for !!con West 2020

Why use a phone book metaphor to explain DNS when you can get even more dated and weird? (This will make more sense when I link the video, but in the meantime, enjoy.)

It Is Known that DNS contains multitudes. Beyond its ten official record types are myriad RFC-described and wildly off-brand uses of them, and DNS TXT records contain the most possibility of weird and creative use. I quote Corey Quinn:

“I mean for God’s sake, my favorite database is Route 53, so I have other problems that I have to work through.”

(From this interview.)

He’s also described it as the only database that meets its SLAs 100 percent of the time. (Route 53 is AWS’s DNS product, encompassing TXT records and many other things, if you have not had the pleasure.)

What is this mystery that is woven through the internet? Let me introduce you to (or reacquaint you with, if you’ve met) the DNS TXT record.

DNS and its ten beautiful children

There are ten kinds of DNS records, each of which will include a reference to a specific domain or subdomain, which usually exists to enable access to that domain’s server or otherwise help with business connected to that domain (email settings, for instance).

The one you might’ve seen or made the most is an A record, or an address mapping record. This is the one that matches URL to IP address – IPv4, in this case. AAAA does the same for IPv6. There are CNAMES, or canonical name records, which alias a hostname to another hostname, often used for things like $marketingproject.business.com, when the marketing project site is being hosted on Heroku or somewhere other than your company’s primary business. You can read about them all here. This post, however, and its accompanying talk (link to come) is about my favorite of them all: TXT records.

TXT records are used for important, fairly official things, but it’s only by agreed-upon practice. While you’ll see very consistent formatting in them for things like SPF, DKIM, DMARC, or domain ownership verification (often in the form of a long random string value for a key that likely starts with _g), the truth is that you can put almost anything in there. My favorite off-brand but still computer-related one I heard about was a large university that put lat/long information in each server’s TXT records, for the sake of finding it more efficiently on a sprawling campus.

For the records’ contents, there are a few restrictions:

  • You cannot exceed 255 characters per string
  • You can include multiple strings in a single TXT record, but they must be enclosed in straight quotes and separated by commas. These can be concatenated into necessarily longer records, like DKIM with longer keys or very elaborate SPF records
  • All characters must be from the 128-character printable ASCII set (no emoji allowed, and no curly quotes or apostrophes either)
  • At least on AWS, you can’t exceed 512 bytes per record, whether it’s a single string or several
  • They are not returned in the order they were added (which made the Emily Dickinson poem I added as three records come out a little funny in my terminal; it still kind of worked, though)

I cribbed that together from a mix of (provider-accented) experimentation and anecdotal information from others who have pondered this stuff. The official docs are often a little hand-wavy on this level of detail (appropriately, I’d say). RFC 1035, for instance, states: “The semantics of the text depends on the domain where it is found.” For its RDATA packet, it offers this:

3.3.14. TXT RDATA format

    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    /                   TXT-DATA                    /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

where:

TXT-DATA        One or more <character-string>s.

I mean, fair. (Data goes in data place; I cannot argue with this.) Future compatibility and planning for unexpected needs to come are a part of every RFC I’ve dug into. Meanwhile, RFC 1464 more directly portends some of the weirdness that’s possible, while also explaining the most common format of TXT files I’ve seen:

        host.widgets.com   IN   TXT   "printer=lpr5"
        sam.widgets.com    IN   TXT   "favorite drink=orange juice"

   The general syntax is:

        <owner> <class> <ttl> TXT "<attribute name>=<attribute value>"

I am accustomed, when dealing with infrastructure tags, to having the key-value format be required, either through a web UI that has two associated fields to complete or through a CLI tool that is very prepared to tell you when you’re doing it wrong.

I have not found this to be the case with TXT records. Whether you’re in a web UI, a CLI, or Terraform, you can just put anything – no keys or values required. Like many standards, it’s actually an optional format that’s just become normal. But you can do what you want, really.

And there are peculiarities. When doing my own DNS TXT poking for this presentation and post, I worked with Dreamhost and AWS, and they acted very differently. AWS wanted only one TXT record per domain and subdomain (so you could have one on example.com and another on wow.example.com), while Dreamhost let me make dozens – but it made DNS propagation really sluggish, sometimes getting records I’d deleted an hour ago, even after clearing the cache. (Dreamhost, meanwhile, has a hardcoded four-minute TTL for its DNS records, which you have to talk to an administrator to change, specially, on your account. It’s always interesting in DNS.) In short, the system is not prepared for that much creativity.

Too bad for the system, though. :)

DNS, ARPANET, hostnames, and the internet as we know it™️

DNS did not always exist in its current scalable, decentralized state, of course. Prior to around 1984, the connection of hostname to IP was done in a file called hosts.txt, which was maintained by the Stanford Research Institute for the ARPANET membership. The oldest example I found online is from 1974, and you can see other examples here and chart the growth of the protointernet. It went from a physical directory to more machine-readable formats, telling you and/or your computer how to reach the host identified as, say, EGLIN or HAWAII-ALOHA. These were static, updated and replaced as needed, and distributed weeklyish.

hosts.txt began its saunter out of our lives when DNS was described in 1983 and implemented in 1984, allowing the internet to scale more gracefully and its users to avoid the risk of stale files. Instead, independent queries began flying around a decentralized infrastructure, with local caches, recursive resolvers, root servers that pointed to top-level domain servers, and nameservers that kept the up-to-date IP addresses and other details for the domains in their care. (You can find a less breezy, more detailed description of this technology you used today here.)

The joys and sorrows of successful adoption

The great and terrible thing about DNS is that so many things rely on it. So if DNS is having a bad day (a much-used DNS server is down, for instance), it can interrupt a lot of processes.

That means, though, that it can also be used to do all sorts of interesting stuff. For instance, a DNS amplification attack involves sending out a ton of DNS queries from lots of sources and spoofing the source address in the packets so they all appear to come from one place, so the responses all go to one targeted server, possibly taking it down. 

TXT records figure into some of this weirdness. Let’s get into some of the interesting backflips folks have done with DNS and its most flexible of record types.

Shenanigans, various

DNS tunneling

This is, so far as I’ve been able to tell, the OG of DNS weirdness. It’s been going on for about 20 years and was first officially described at Black Hat in 2004 by Dan Kaminsky (who stays busy finding weird shit in DNS; if you like this stuff, you’ll enjoy digging into his work).

There are a few different ways to do this, but the central part is always smuggling some sort of information in a DNS query packet that isn’t supposed to be there. 

DNS packets are often not monitored in the same way as regular web traffic (but my Google ads, in the wake of researching this piece, will tell you that there are plenty of companies out there who’d love to help you with that). The permissiveness of DNS query packet movement makes a great vector for exfiltrating data or getting malware into places it would otherwise be hard to reach.

Data is sometimes smuggled via nonexistent subdomains in the URLs the packet seems to be querying for (c2VyaW91cyBldmlsIGJ1c2luZXNzIGlzIGFmb290IGhlcmUgb2ggbm8.evil.com, for instance), but if your packet is designed to return, say, a nice chunk of TXT records? You can really stuff some information or code in there. DNS queries: they smuggle stuff AND evade lots of firewalls. Awesome!

Thwarting internet censorship

The more common DNS association with censorship is avoiding government DNS poisoning by manually setting a DNS server to 8.8.8.8. This isn’t a perfect solution and is getting less useful as more sophisticated technology is put to monitoring and controlling tech we all rely on. However, there’s another way, like David Leadbeater’s 2008 project, which put truncated Wikipedia articles in TXT records. They aren’t live anymore, but there are so many possible uses for this! Mayhem, genuine helpfulness… why not both?

DNSFS OMG OMG

Ben Cox, a British programmer, found DNS resolvers that were open to the internet and used TXT records to cache a blog post on servers all around the world. He used 250-character base64 strings of about 187 bytes each to accomplish this, and he worked out that the caches would be viable for at least a day.

I love all of this stuff, but this is probably my favorite off-brand TXT use. I honestly screamed in my apartment when I saw his animated capture of the sending and reassembly of the blog post cache proof of concept. 

Contributing to the shenanigans corpus

So naturally, I wanted in on this. One does not spend weeks reading about DNS TXT record peculiarities without wanting to play too. And naturally, as a creator of occasionally inspirational zines, I wanted to leave a little trove of glitter and encouragement in an unlikely part of the internet that was designed for no such thing.

Pick a number between 1 and 50. (It was going to be 0 and 49, but Dreamhost will allow 00 as a subdomain but not 0. Go figure!) Use that number as the subdomain of maybethiscould.work. And look up the TXT record. For example:

dig txt 3.maybethiscould.work

will yield

3.maybethiscould.work. 14400 IN TXT "Tilt your head (or the thing) and look at it 90 or 180 degrees off true."

Do a dig txt on only maybethiscould.work, and you’ll see them all, if DNS doesn’t choke on this. My own results have varied. If you like spoilers, or not having to run dig 50 times to see the whole of one thing you’re curious about, you can also see the entire list here.

In the meantime, now you know a little bit more about a thread of the internet that you’ve been close to and benefited from for some time. And next time: always dig the TXT record too. Start with dns.google if you want a strong beginning.

/etc/services Is Made of Ports (and People!)

Hospital switchboard, 1970s
I’ve been using a switchboard/phone extension metaphor to explain ports to my non-engineer friends who have been asking about my talk progress. Feels fitting to start the blog version of it with this switchboard professional.

You can view the video version of this talk here.

I learned about /etc/services in my first year of engineering. I was working with another engineer to figure out why a Jenkins master wasn’t connecting to a worker. Peering through the logs and netstat output, my coworker spied that a service was already listening on port 8080. “That’s Jenkins,” he said.

“But how did you know that?”

“Oh, /etc/services,” he replied. “It has all the service-port pairings for stuff like this.”

Jenkins is not, in fact, in /etc/services, but http-alt is listed at port 8080. The more immediately relevant answer was probably “through experience, because I’ve seen this before, o junior engineer,” but his broader answer got me curious. I spent some time that afternoon scrolling through the 13,000-plus-line file, interested in the ports but especially curious about the signatures attached to so many of them: commented lines with names, email addresses, and sometimes dates, attribution for a need as yet unknown to me.

I got on with the business of learning my job, but /etc/services stayed in my head as a mystery of one of my favorite kinds: everyone was using it, but many well-qualified engineers of my acquaintance had only partial information about it. They knew what used it, or they at least knew that it existed, but not where it came from. The names in particular seemed to surprise folks, when I asked colleagues for their knowledge about this as I was doing this research.

This post, a longer counterpart to my !!con West talk on the same subject, digs into a process and a file that was once commonplace knowledge for a certain kind of back-end and network engineer and has fallen out of more regular use and interaction.  I’ll take you through some familiar services, faces old and new, correspondence with contributors, and how you – yes, you – can make your mark in the /etc/services file.

What is it, where does it live

In *nix systems, including Mac OS, /etc/services lives exactly where you think it does. Windows also has a version of this file, which lives at C:\Windows\System32\drivers\etc\services. Even if you’ve never opened it, it’s been there, providing port name and number information as you go about your life.

The file is set up like this: name, port/protocol, aliases, and then usually a separate line for any comments, which is where you’ll often find names, email addresses, and sometimes dates. Like so:

ssh      22/udp  # SSH Remote Login Protocol

ssh      22/tcp  # SSH Remote Login Protocol

#                   Tatu Ylonen <ylo@cs.hut.fi>

The most common protocols are UDP and TCP, as those were the only ones you could reserve until a few years ago. However, as of an August 2011 update to RFC 6335 (more on that later), you can now snag a port to use with SCTP and/or DCCP as well. This RFC update added more protocols, and it also initiated a change from the old practice of assigning a port for both UDP and TCP for a service to only allocating the port for the protocol requested, and just reserving it for the others, though they’ll only be used if other port availability dwindles significantly.

Incidentally, the presence of a service in /etc/services does not mean the service is running on your computer. The file is a list of possibilities, not attendance on your machine (which is why your computer is probably not currently on fire).

Going through the first 500-odd lines of the file will show you some familiar friends. ssh is assigned port 22. However, ssh also has an author: Tatu Ylonen. His bio includes a lot of pretty typical information for someone who’s listed this far up in this file: he designed the protocol, but he has also authored several RFCs, plus the IETF standards on ssh.

Jon Postel is another common author here, with 23 entries. His representation in this file just hints at the depth of his contributions – he was the editor of the Request for Comment document series, he created SMTP (Simple Mail Transfer Protocol), and he ran IANA until he died in 1998.  A high /etc/services count is more a side effect of the enormity of his work rather than an accomplishment unto itself.

It’s cool to see this grand, ongoing repository of common service information, with bonus attribution. However, that first time I scrolled (and scrolled, and scrolled) through the entirety of /etc/services, what stayed with me were how many services and names I wasn’t familiar with – all this other work, separate of my path in tech thus far, with contact information and a little indicator of what that person was up to in, say, August 2006.

For instance: what’s chipper on port 17219? (It’s a research rabbit hole that took me about 25 minutes and across Google translate, the Wayback Machine, LinkedIn, Wikipedia, a 2004 paper from The European Money and Finance Forum, AMONG OTHER RESOURCES. Cough.) chipper, by Ronald Jimmink, is one of two competing e-purse schemes that once existed in the Netherlands; the longer-lasting competitor, Chipknip, concluded business in 2015. The allure of these cards, over a more traditional debit card, was that the value was stored in the chip, so merchants could conduct transactions without connectivity for their card readers. This was a common race across Europe, in the time before the standardization of the euro and banking protocols, and chipper is an artifact of the Netherlands’s own efforts to find an easier way to pay for things in a time before wifi was largely assumed.

Then there’s octopus on port 10008 (a port which apparently also earned some notoriety for being used for a worm once upon a time). Octopus is a a professional Multi-Program Transport Stream (MPTS) software multiplexer, and you can learn more about it, including diagrams, here.

There are, of course more than 49,000 others; if you have some time to kill, I recommend scrolling through and researching one whose service name, author, or clever port number sparks your imagination. Better still, run some of the service names by the longer-tenured engineers in your life for a time capsule opening they won’t expect.

Port numbers and types

Ports are divided into three ranges, splitting up the full range of 0-65535 (the range created by 16-bit numbers).

  • 0-1023 are system ports (also called well-known ports or privileged ports)
  • 1024-49151 are user ports (or registered ports)
  • And 49152-65535 are private ports (or dynamic ports)

Any services run on the system ports must be run by the root user, not a less-privileged user. The idea behind this (per W3) is that you’re less likely to get a spoofed server process on a typically trusted port with this restriction in place.

Ok, but what actually uses this file? Why is it still there?

Most commonly, five C library routines use this file. They are, per (YES) the services man page:

  • getservent(), which reads the next entry from the services database (see services(5)) and returns a servent structure containing the broken-out fields from the entry.
  • getservbyname(), which returns a servent structure for the entry from the database that matches the service name using protocol proto
  • getservbyport(), which returns a servent structure for the entry from the database that matches the port port (given in network byte order) using protocol proto
  • setservent(), which opens a connection to the database, and sets the next entry to the first entry
  • endservent()

The overlapping use of these routines makes service name available by port number and vice versa. Thus, these two commands are equivalent:

  • telnet localhost 25
  • telnet localhost smtp

And it’s because of information pulled from /etc/services.

The use  you’ve most likely encountered is netstat, if you give it flags to show service names. The names it shows are taken directly from /etc/services (meaning that you can futz with netstat’s output, if you have a little time.)

In short: /etc/services used to match service to port to give some order and identity to things, and it’s used to tell developers when a certain port is off limits so that confusion isn’t introduced. Human readable, machine usable.

Enough about ports; let’s talk about the people

First, let’s talk about the method. Shortly after getting the acceptance for my !!con talk, I went through the entire /etc/services file, looking for people to write to. I scanned for a few things:

  • Email addresses with domains that looked personal and thus might still exist
  • Interesting service names
  • Email addresses from employers whose employees tended to have long tenures
  • Anything that sparked my interest

I have a lot of interest, and so I ended up with a list of 288 people to try to contact. The latest date in my local copy of /etc/services is in 2006, so I figured I’d be lucky to get responses from three people. And while more than half of the emails certainly bounced (and the varieties of bounce messages and protocols had enough detail to support their own fairly involved blog post), I got a number of replies to my questions about how their work came to involve requesting a port assignment, how it was that they knew what to do, and how the port assignment experience went for them.

I will say that the process revealed an interesting difference between how I’d write to folks as a writer and researcher (my old career) vs. how one writes questions as an engineer. As a writer working on a story that would eventually involve talking to people about a subject, I would research independently and only later approach the people involved; my questions would start a few steps back from what I already knew to be true from my research. This allows room for people to feel like experts, to provide color and detail, and to offer nuance that doesn’t come out if the person asking questions charges in declaring what they know and asking smaller, more closed questions.

This is… not the case in computer science, when questions are typically prefaced with a roundup of all 47 things you’ve googled, attempted, and wondered about, in the interest of expediency. This meant that my very second-person questions, in the vein of “how did you do this” and “what was the nature of your process,” sometimes were taken as some email rando not being able to, how you say, navigate the internet in a most basic way.

The more you know.

Happily, I got more than three responses, and people were incredibly generous in sharing their experiences, details of their work, and sometimes relevant messages from their astonishingly deep email archives.

bb, port 1984: Sean MacGuire

The first response I got, delightfully, was for the service named after my initials: bb, on port 1984. More delightfully, this turned out to be the port reserved for software called Big Brother, “the first Web-based Systems and Network Monitor.” Its author, Sean MacGuire, figured out the process for reserving a port after researching IANA’s role in it. At the time (January 1999), it was approved in 12 days. He described it as “totally painless.” Fun fact: Sean also registered the 65th registered domain in Canada, which required faxing proof of the company and making a phone call to the registrar.

The thing I started to learn with Sean’s response was how this was, at one point, pretty ordinary. Most web-based services restrict themselves to ports 80 and 443 now, in large part because a lot of enterprise security products clamp down on access by closing less-commonly used ports, so reserving a port for your new service isn’t always a necessary step now.

In which I am written to by one of the chief architects of the internet as we know it

The next response I got was from a little further back in computer science history. For context, I’ll tell you how I went about contacting people: back in December, after my talk was accepted for !!con West, I went through the /etc/services file on my computer and selected people to contact. I picked people whose email address domains looked like they might still be around, who worked for companies where people tend to have long tenures, or who were contacts tied to interesting-sounding services.

I did this across about ten days, which meant that, by the time I got to the end, I couldn’t have recounted to you the first folks I selected, particularly as I’d chosen 288 people to try to reach in all. Incidentally, about half of those bounced – not nearly as many as I expected.

This is all to say that I was a little startled to read this response:

> How did you come to be the person in charge of reserving the port

I designed the HTTP protocol

Which reminded me that I had indeed selected this entry as one worth diving into, when I was first getting a handle on this research:

http          80/udp www www-http # World Wide Web HTTP

http          80/tcp www www-http # World Wide Web HTTP

#                       Tim Berners-Lee <timbl@W3.org>

He was, I am pleased to say, impeccably polite in his brief response, and he recommended his book, Weaving the Web, which is such a wonderful look at finding compromise between competing standards and design decisions across dozens of institutions, countries, and strong-willed people. As he said, more information on his place within this work and that file can be found there, and if you’re at all curious, I so recommend it.

More contributors

I also liked that some people had fun with this or considered it, as Christian Treczoks  of Digivote, port 3223, put it, a “real “YEAH!” moment.” Barney Wolff, of LUPA, worked a few layers of meaning into his assignment of port 1212: “I picked 1212 because the telephone info number was <area>-555-1212. And LUPA (an acronym for Look Up Phone Access) was a pun on my last name. I don’t know if my bosses at ATT or anyone at IANA ever noticed that.”

Christian Catchpole claimed port 1185, appropriately named catchpole. He requested a low port number in the interest of claiming something memorable. He explained: “The original project back in 2002 involved a compact wire protocol for streaming objects, data structures and commands etc. While the original software written is no longer in the picture, the current purpose of the port number involves the same object streaming principal.  I am currently using the port for async communications for my autonomous marine robotics project.” The original uses of many ports have shifted into computer science history, but Christian’s projects live on.

Alan Clifford (mice, port 5022) claimed his space for a personal project; approval took 21 days. (He, like several people I contacted, keeps a deep and immaculate email archive.) Mark Valence (keysrvr at 19283 and keyshadow at 19315) recounted his involvement thusly: “I was writing the code, and part of that process is choosing a port to use.” He ended up in /etc/services around 1990 or 1991, when his team was adding TCP/IP as an option for their network service a year or so prior, enabling Macs, PCs, and various Unix systems to communicate with each other.

Ulrich Kortenkamp (port 3770, cindycollab) was one of two developers of Cinderella, and he claimed a port in /etc/services to codify their use of a private port for data exchange. He added: “And I am still proud to be in that file :)”

Greg Hudson’s contributions date to his time as a staff engineer at MIT, when he became a contributor to and then a maintainer of the school’s Zephyr IM protocol (zephyr-hm in the file) and then similarly with Subversion, the open-source version control system now distributed by Apache. His name is connected to ports 2102-2104 for Zephyr and port 3690 for Subversion.

Jeroen Massar has his name connected to four ports:

He noted that AURORA also has an SCTP allocation too, which is still fairly rare, despite that protocol being available since 2011. He remarked, “[This] is actually likely the ‘cool’ thing about having ‘your own port number’: there is only 65536 of them, and my name is on 4 of them ;)”

I asked people how they knew what to do; some were basically like :shrug: “The RFC?” But others explained their context at the time. Mostly, folks seemed to have general industry awareness of this process and file just because of the work they did. (“I was the ‘Network Guy’ in the company,” said Christian Treczoks.)  Some knew the RFC or knew to look for it; others had been involved with the IETF and were around for the formation of these standards. My anecdotal impression was that it was, at that point, just part of the work. If you were on a project that was likely to need a port, it was known how you’d go about getting it.

Who controls this file? Where does the official version come from?

Like so many things in the big world of the internet, /etc/services and its contents are controlled by IANA. The official version varies; what’s on IANA’s official and most up-to-date version deviates some from what you might find locally. The version of /etc/services on my Mac, as I’ve mentioned, is about 13 years out of date. However, people are still claiming ports, and you can see the most current port assignments at IANA’s online version.

On most Unixes, the version of /etc/services you see is a snapshot of the file taken from IANA’s official version at the time that version of the OS was released. When installing new services, often you’ll want to tweak your local copy of /etc/services to reflect the new service, if it’s not already there, even if only as a reminder.

However, updates vary between OSes; the version included with the Mac OS is not the most current, and how updates are added and communicated can vary widely. Debian, for instance, furnishes /etc/services as part of the netbase package, which includes “the necessary infrastructure for basic TCP/IP based networking.” If the included version of /etc/services got out of date, one could file a bug to get it updated.

To learn how /etc/services managed and how to contribute, the golden standard is:

RFC 6335

No, seriously. The Procedures for the Management of the Service Name and Transport Protocol Port Number Registry has most of what you would need to figure out how to request your own port. While the process is less common now, it’s still regimented and robustly maintained. There’s a whole page just about statistics for port request and assignment operations. While this isn’t as commonly used as it once was, it’s still carefully governed.

RFC 7605, Recommendations on Using Assigned Transport Port Numbers, includes guidance on when to request a port). 7605 and 6335 are concatenated together as BCP 165, though 6335 is still referred to frequently and is the most commonly sought and cited resource.

How can you get your piece of /etc/services glory?

There’s real estate left to claim; as of this writing, more than 400 ports are still available. Others have notes that the service claims are due to be removed, with timestamps from a few years ago, and just haven’t yet.

If you have a service in need of a port, I am pleased to tell you that there is a handy form to submit your port assignment request. You’ll need a service name, transport protocol, and your contact information, as you might guess. You can also provide comments and some other information to bolster your claim. The folks managing this are pretty efficient, so if your request is valid, you could have your own port in a couple of weeks, and then you could be hiding inside of your computer many years from now, waiting to startle nosy nerds like me.

Despite the shift to leaning more heavily on ports 80 and 443, people are absolutely still claiming ports for their services. While the last date in my computer’s /etc/services file is from about a decade ago, the master IANA list already has a number of dates from this year.

So: make your service, complete the form, and wait approximately two weeks. Then cat /etc/services | grep $YourName, and your immortality is assured (until IANA does an audit, anyway).  

And if you do, please let me know. Enabling people to do great, weird, and interesting things is one of my favorite uses of time, and it’d make my day. Because there are no computers without people (or not yet, anyway), and a piece of that 16-bit range is waiting to give your work one more piece of legitimacy and identity.

Thanks to everyone who wrote back to me for being so generous with their experiences, to Christine Spang for the details about Debian /etc/services updates, to the patient overseers of RFC 6335, and the authors of all the RFCs and other standards that have managed to keep this weird world running.