Adam Zerner

Bad names make you open the box

Think of a function as a black box. It takes an input, and spits back out an output.

For getPromotedPosts, you can feed it a list of blog posts and it will spit back out the ones that have been promoted.

But I probably didn't need to explain that to you, did I? Why not? Well, because getPromotedPosts is self-explanatory. Because getPromotedPosts is named well.

Now what if instead of getPromotedPosts, it was named something like getThePosts? Well, that name isn't self-explanatory. You know it's getting some posts, but it's not clear which ones. The most recent posts? Posts from a certain author? Posts from this week?

As a programmer, what do you do in this situation? You scroll to the function definition and start reading the code.

function getThePosts(posts) {
  ...
}

In other words, you open the box.

What does that look like? Something like this:

  • You're reading through the code in some file. Line one, line two, line three.
  • You reach line 30 and see getThePosts.
  • You realize that getThePosts must be getting some posts, but you don't know which ones. So you have to scroll to line 174 where getThePosts is defined.
  • On line 174 you start reading through getThePosts. Once you reach line 210, you realize that it is getting promoted posts. Cool!
  • Now you scroll back up to line 30. You realize that getThePosts is giving you promoted posts, but you forgot what was happening before line 30. Damn. So now you have to go back to line 10 or 15 to remind yourself what was going on in the first place.

Complexity and zoom level

Maybe I was being dramatic. Is it really such a big deal to have to scroll to the definition of getThePosts on line 174? Will it really take that much effort to read lines 174-210 and figure out that it's returning promoted posts? It's only 36 lines of code, including whitespace + brackets, and you could probably glance over parts of it. And then what about returning to line 30? Are you really going to have forgotten what was going on so quickly? Are you really going to have to scroll back up to line 10 or 15 to remind yourself?

In this example, perhaps not. I'm not sure. Maybe having to open the box gets in the way, maybe it doesn't. What I am sure of is that when the code becomes more complicated, having to open the box becomes more of an issue. I think Eric Dietrich's post on how harmful interruptions can be to a programmer gives us a great intuition for this:

For a programmer, an interruption is oh-so different. There you sit, 12 calls into the call stack. On one monitor is a carefully picked set of inputs to a complex form that was responsible for generating the issue and on the other monitor is the comforting dark theme of your IDE, with the current line in the debugger glowing an angry yellow. You’ve been building to this moment for 50 minutes — you finally typed in the right inputs, understood the sequence in which the events had been fired, and got past the exact right number of foreach and while loops that took a few minutes each to process, and set your breakpoint before the exception was triggered, whipping you into some handler on the complete other end of the code base. Right now, at this exact moment, you understand why there are 22 items in the Orders collection, you know what the exact value of _underbilledCustomerCount is and you’ve hastily scribbled down the string “8xZ204330Kd” because that was the auto-generated confirmation code resulting from some combination of random numbers and GUIDs that you don’t understand and don’t want to understand because you just need to know what it is. This is the moment where you’re completely amped up because you’re about to unlock the mysteries of what on earth could be triggering a null reference exception in this third party library call that you’re pretty sure —

“HI!!! How’s it going? So, listen, you know that customer order crashing thing is, like, bad, right? Any chance I can get an ETA on having that fixed?”

This example is on the opposite end of the spectrum. Here you're dealing with a ton of complexity, whereas in getThePosts the amount of complexity was probably pretty low. So maybe this means that as programmers, we can just use our judgement? For complicated code, take the time to come up with good names. For simple code, fuggedaboutit.

In theory, I think this makes sense. But in practice, I think it often leads to a lot of issues.

Imagine yourself zooming in on a piece of code. You ask yourself whether it's really such a big deal that the name you used is a little confusing. Your answer is usually going to be something like:

Nah, I think it's fine. It's not that complicated. They'll be able to figure it out.

Now, imagine yourself zooming out and thinking about the entirety of the codebase. Or even just a particular module. You ask yourself whether it's really such a big deal that the code is a little sloppy and confusing. Your answer is usually going to be something like:

Yes! It is a big deal! I'd be able to move so much faster if the code wasn't such a mess!

At least that's what I argue for in Taking The Outside View On Code Quality. In theory, your zoomed in answers would always match your zoomed out answers, but in practice, the answers will depend on the scale you're looking at. I think that this is a really important thing to keep in mind when you ask yourself whether you need to come up with a better name. And I think that the zoomed out perspective is usually the wiser choice.

In taking the zoomed out perspective, I think that it will usually lead you to the conclusion that naming is important. The big example where it wouldn't is when you know you are going to throw the code away. For example, if you are building a prototype. If the prototype is unsuccessful, you'll ditch it. If it is successful, you'll probably rewrite it (perhaps). Either way the prototype code gets ditched. So in that situation, you probably don't need to waste time naming things well. But that is the exception, not the rule. If the code you're writing is "business as usual", investing in good names will pay dividends.

Not just software

It's not just software. Names and black boxes apply to many other domains, including everyday life. For example, the other day I was reading a post about covid, and it kept referring to B.1.617. And B.1.2, and B.1.1.7, and P.1. Huh? I knew that these were all different covid strains, but I couldn't keep track of which was which. I had to pause my reading, google "B.1.617 covid strain", see that it is the Indian strain, and then pick back up where I left off. In other words, I had to open the box.

Honestly, this happens all of the time. It happens at work when people refer to a JIRA ticket as "7967" instead of "the stashboard epic". And when people use weird acronyms like BDM (business development manager). And when things in science are named after people rather than some sort of affordance. Wouldn't it suck if prediction markets were called Hansonian markets?

Misleading

What sucks even more is when names are actively misleading. For example, the concept of regression to the mean had confused me for a while. The term "regress" sounds like it means "move down", but instead it just means "move closer to". So if covid cases have been unusually low over the past few days and we expected them to tick back up, we would still call it regressing to the mean.

Let's look at an actively misleading name in the context of software. Think back to our getPromotedPosts example. The idea is that we have a blog and we want to place promoted posts at the top. But imagine that at some point, management stormed in and demanded that Tom Fahrahs' posts be given that prime real estate, because Tom is one of our investors and he has a new book coming out that he wants to promote: The Four Second Sex Life.

So the dev team comments out the body of getPromotedPosts and has it instead just get the five most recent Fahrahs posts. It works.

function getPromotedPosts() {
  /*

  old
  code
  here

  */

  return fiveMostRecentFahrahsPosts;
}

cue Jaws music

Then after the book launch is a big hit and the team is ready to move back to the old logic, someone new to the codebase winds up writing a new function called topPosts. The codebase is a mess so they thought it'd be easier to just write their own function, and "top" seemed like it'd make sense because, after all, it's getting posts that will be placed at the top of the page.

But they never delete the now deprecated code for getPromotedPosts.

dah dan

Fast forward six months. The product team wants a redesign of the page, and it's your job to code it up. The existing code is a bit of a mess, so you tear it apart. Not completely though.

As you're working on the section for promoted posts, you notice topPosts and getPromotedPosts in the old code. topPosts sounds like it's referring to the best posts, so that probably isn't what you want. On the other hand, getPromotedPosts sounds like it's exactly what you want, so you use it.

dah dan

Since you're lazy, you don't really QA it.

dah dan dah dan

It passes code review because getPromotedPosts sounds reasonable to the other team members too.

dah dan dah dan

And it also passes QA because they don't find it odd that Fahrahs posts were at the top, given how popular he is.

dah dan dah dan dah dan dah dan

It actually even takes a while for the bug to get discovered in production, for the same reason. It isn't until Fahrahs starts writing about the testimonials he's received from readers of his previous book that someone notices a quirk in the algorithm.

screams!!!

Hey, remember when we were talking about how getPromotedPosts is low complexity and how maybe we can just forget about naming things well?

Pot brownies

Maybe a better way to make this point is with a pot brownies analogy.

Imagine that you open the fridge. You see something labeled "brownie". You eat it.

Then you hop in the car and start heading over to your friends house. But right as you merge on to the highway, you start feeling funny.

Turns out that the "brownie" label was a little misleading. It wasn't a regular brownie. It was a pot brownie. There was something dangerous inside the brownie, but the label didn't reflect that.

This is similar to poorly named functions with dangerous side effects. In both cases, if the thing in question can have dangerous side effects, you really want to make sure that it is reflected in the label. You can't trust that people will read beyond the label. And even if you could, you wouldn't want people to have to do that. You'd rather them be able to get the information they need from the label.

Trust

Wow, those sections were pretty scary huh? Well, it gets worse.

It sucks that no one changed the name of getPromotedPosts to getFahrahsPosts or something and it led to the bug of Fahrahs posts being promoted for so long. But consider what happens in the aftermath of that bug.

Imagine that after going through that nightmare, you see a new function called getTodaysPosts. It seems simple enough. It probably just gets all of the posts that were written today. Right?

Nope! You're not gonna fall for that again! Last week you thought that getPromotedPosts was going to just get you the promoted posts, but instead it only got you Fahrahs posts, and your boss gave you a stern talking to. So why would you trust that getTodaysPosts is going to do what it implies?

You're not. Your trust has been violated, so you're going to open the box. You're going to scroll to getTodaysPosts and read through it just to make sure it does what you think it does. Same for getTopTechPosts and getMostRecentEconPosts. You have to open those boxes too when you come across them, just to make sure.

But as we talked about in the "Complexity and the zoom level" section, this is a really bad situation to be in. To some extent, software is all about managing complexity. Closed boxes really help us manage complexity. But now, due to the violation of trust, that tool has been taken away from us.

Compression of complexity

There is something really, really powerful going on here, and I'm worried that I'm not doing it justice. I'm worried that I'm not hitting the nail on the head regarding why this all is so important. Let me try explaining it differently.

Consider that original example of getThePosts.

  • You're reading through the code in some file. Line one, line two, line three.
  • You reach line 30 and see getThePosts.
  • You realize that getThePosts must be getting some posts, but you don't know which ones. So you have to scroll to line 174 where getThePosts is defined.
  • On line 174 you start reading through getThePosts. Once you reach line 210, you realize that it is getting promoted posts.

There, you have to read 36 lines of code to understand what is going on. But now imagine that you take all of that complexity, and compress it.

That's the power of good names. It allows you to take a bunch of complexity and pacakge it up into a dense little box. Now instead of dealing with this:

You just have to deal with this:

Much nicer, right?

Not just functions

Initially, I started with the analogy of functions as a black box, and I talked about how a good name makes it clear what output the inputs will get mapped to. Then in the "Not just software" section I talked about how this analogy doesn't just apply to functions in software, it applies to everyday life. I think this softly alludes to the fact that within the domain of software, the analogy applies to things like variable names and class names too, not just function names. In this section, I want to make that point more explicitly.

Consider a variable name:

// birthday
var d = '11/03/1992';

On this line of code, because of the code comment, it's clear that it's referring to a birthday. But later on when we reference d, it will no longer be clear what it is referring to. And because of this, we'll have to scroll up to the point where d is declared.

I see this as a version of opening the box. There is a box that contains '11/03/1992'. We named that box d. If it was named currentUsersBirthday or something, you wouldn't have to open the box, but with a poor name like d, you do.

A similar point can be made for module names, table names, column names, folder names, and file names. For classes, I'm not sure how well the analogy holds, but names are important there too.

Binary

I'm the type of person who likes to sit for a few minutes and brainstorm the right name for something. I feel strongly about this point that names are incredibly powerful things and are usually worth investing in. On the other hand, I find that many other people don't even want to invest a few seconds in this.

So then, naming has been on my mind recently. And I've been searching for the right analogy to explain why I think it is so important. I took a stab at this a few weeks ago in Naming and Pointer Thickness. There I argue that some names do a better job than others at pointing to the underlying substance. For example, "start" does a better job than "commence". Maybe "start" is a 9/10 and "commence" is a 3/10.

What I'm arguing in that post is that there is some spectrum. On the other hand, in this post, I'm talking about it as if it's binary: either you have to open the box, or you don't.

Calling it a spectrum is more accurate than calling it binary. However, accuracy isn't really the goal here. Usefulness is. And I sense that treating it as if it's binary is more useful.

I'm not sure how to explain why I think this. Maybe it's because it draws a hard line between failure and success, and having a hard line like that makes everything more salient.

I'm sorry. I just screwed up. "Salient" might require you to open the box. Let me try again.

What I mean is that with the box analogy, when you have a name that requires someone to open the box, it sticks out and is very clear. I can visualize some readers being frustrated and having to google the word "salient". On the other hand, with the pointer thickness analogy being a spectrum, you might sense that the pointer is sorta thin, but it's easier to dismiss that. "It's fine. It's good enough."

Another perspective that is related to this saliency point is that "open the box" is action oriented. It conveys that you have to go out of your way and do something. Take some extra step that you otherwise wouldn't have to take.

It's hard to articulate these sorts of things though. The real reason why I like the analogy isn't because I can think of some clever explanation for why it makes sense. The real reason I like it is because, empirically, it feels right when I use it. I'm just one person though, and have only recently started using it. The real test of whether it is a good analogy will be how people respond to this post.

Postscript

Googling things well is the inverse of naming things well. 🤯


If you have any thoughts, I'd love to discuss them over email: adamzerner@protonmail.com.

#code

- 2 toasts