# Big O, little n

*
*

I remember when I was first learning about hash tables. I thought I stumbled across an ingenious optimization for dealing with hash collisions. Before explaining my optimization, let me first briefly explain how hash tables work and what hash collisions are. I also made a video if you're interested.

Say that we have a hash that maps someone's name to their age. `"adam"`

is `27`

, so we want to enter that into our hash.

Here's what happens:

`"adam"`

gets put through a hash function.- The hash function spits out
`7`

. `7`

is the index of the array where we are going to store`"adam"`

's value.- So we store
`27`

into`array[7]`

.

If we run `hash.get("adam")`

in the future and need to look up the value for `"adam"`

, we:

- Run
`"adam"`

through the hash function. - Get
`7`

as the output. - This means we have to look at
`array[7]`

to get the value. - So we look at
`array[7]`

to get our value.

Hopefully if we have a key of `"alice"`

or `"bob"`

it'll hash to a different index. But... what happens if it hashes to the same index? That's called a hash collision.

A common approach is that if there's a collision, you store the values in a linked list.

Fair enough. That makes sense.

But wait!!! It takes `O(n)`

time to look up a value in a linked list. Can't we use a binary search tree to speed that up to `O(log n)`

???

Or... can't we take it a step further and use a *hash inside of a hash* to get the lookup time all the way down to `O(1)`

?!

I hate to say it, but however many years ago when I had these thoughts, there were a few moments where I thought I was a genius. Now when I look back at that former self, I facepalm.

It is true that going from a linked list to a BST to a hash takes you from `O(n)`

to `O(logn)`

to `O(1)`

lookup time. However, since collisions are rare, `n`

is going to be small. And when `n`

is small, `O(n)`

might be faster than `O(1)`

.

To understand why that is, think back to what Big-O really means. I think it helps to think about it as a "math thing" instead of a "programming thing". Consider two functions:

```
f(n) = 4n + 1000
g(n) = 2n^2 + 5
```

Big-O of `f`

is `O(n)`

and Big-O of `g`

is `O(n^2)`

. We just focus on the part that "really matters". When `n`

gets really big, the fact that it's `4n`

doesn't really matter. Nor does the `+ 1000`

.

But what about when `n`

is small? Well, let's look at happens when `n`

is `5`

:

```
f(5) = 4(5) + 1000 = 20 + 1000 = 1020
g(5) = 2(5^2) + 5 = 2(25) + 5 = 50 + 5 = 55
```

Look at that! The `O(n^2)`

function is almost 20 times faster than the `O(n)`

function!

And *that* is basically why they use a linked list instead of a BST or hash to handle collisions. Since `n`

is small, Big-O isn't the right question to ask.

Here's another example. I just wrote the following code while prepping for an interview:

```
const isVowel = (char) => ["a", "e", "i", "o", "u"].includes(char);
```

Normally I wouldn't think twice about it, but since interviewers care so much about time complexity, I stopped to think about whether it could be improved.

And it hit me that it's actually `O(n)`

(caveat), because we've got an array and have to iterate over every element to see if the element matches `char`

. It's easy to overlook this because we're using `includes`

and not writing the code ourselves.

So then I thought that maybe we could use a hash instead to get `O(1)`

lookup. Something like this:

```
const isVowel = (char) => {
const vowels = {
a: true,
e: true,
i: true,
o: true,
u: true,
};
return !!vowels[char];
};
```

But then I realized that this is the same mistake I made with the hash collision stuff however many years ago. * n is small, so Big-O isn't the question we should be asking*. Here

`n`

is `5`

.I think that the array approach would actually be faster than the hash approach, even though it's `O(n)`

instead of `O(1)`

. The reason for this is called locality.

If you dive deep under the hood, when you look up an element in an array, the CPU actually grabs a bunch of adjacent elements as well as the one you wanted, and it stores the adjacent elements in a cache. So here when we look up `"a"`

in the array, it'll probably grab `"a"`

, `"e"`

, `"i"`

, `"o"`

, and `"u"`

. And so next time when we want to grab `"e"`

, it can take it from the cache, which is a lot faster. This works because the elements in the array are close to each other in the physical memory. But with a hash, my understanding is that they wouldn't be so close, and thus we wouldn't benefit from this spatial locality effect.

I'm no low-level programming wiz so I'm not sure about any of this. That's ok, I think it's beside the point of this post. The real point of this post is that when you have a little `n`

, Big-O doesn't matter.

If you have any thoughts, I'd love to discuss them over email: adamzerner@protonmail.com.

If you'd like to subscribe, you can do so via email or RSS feed.