Big O Notation

Greetings and welcome to the new year and the new blog design! On New Years Day I started another round of the #100DaysOfCode Challenge. I'll be writing more about that later, but at the moment I am working through the JavaScript Algorithms and Data Structures Masterclass on Udemy. I don't know if I'll finish the course by April, but I'm trying to do a little bit at a time.

The course is pretty good and clear so far. I'm intimidated by Data Structures and Algorithms as it seems super conceptual and math-intensive. So far though, the little bit of math that's been introduced has been digestible (even for someone as math-averse as I am) and the concepts have been broken down well. The course starts with an introduction to Big O Notation, and that's what I want to write about here.

First I'll list out some definitions (for easy access when I want to revisit this topic) and I'll walk through the details after that.

Definitions

  • Big O Notation: a way to formalize fuzzy counting. It allows us to talk formally about how the time or space complexity of an algorithm grows as the size of input increases
  • Time Complexity: a way to analyze the runtime of an algorithm as the size of the input increases
  • Space Complexity: a way to analyze how much additional memory an algorithm needs to allocate in order to run as the size of the input increases
  • Logarithm: the inverse of an exponent (like how multiplication is the inverse of division)

Measuring Code Performance

Say that you're asked to reverse a string using code. You're given "hello" and you want "olleh". There are a ton of ways to do this in JavaScript (or any other coding language). In fact, here is a Stack Overflow answer that details ten different solutions to this problem. So how do you know which of those solutions is the best bit of code?

Or, let's pivot to a different problem, with two example solutions, and use that as a common starting place:

Example: Sum

Problem: Write a function that calculates the sum of all numbers from 1 up to and including some number n.

Solution 1:

function addUpTo(n) {
  let total = 0;
  for (let i = 1; i <= n; i++) {
    total += i;
  }
  return total;
}

Solution 2:

function addUpTo(n) {
  return n * (n + 1) / 2;
}

Which Solution is Better?

To begin, it's worth pointing out that there are a number of ways that one code solution can be "better" than another. Where one solution works faster, another may be less memory-intensive, or a third might be more readable. Personally I find Solution 1 to be far more readable and understandable than Solution 2, and I would probably go with a solution that either uses a loop or some kind of similar method. But both of these solutions work.

However while speed, memory, and readability are all valid concerns, largely speaking people prioritize the fastest or least-memory-intensive solutions.

Let's focus on speed first. Would it be reasonable to use a timer? We could write something that takes the times when an operation starts and finishes, then determines the difference between those two times, then repeats the timing for an alternative operation, and go with the one that takes less time:

Solution 1 (With Timer):

function addUpTo(n) {
  let total = 0;
  for (let i = 1; i <= n; i++) {
    total += i;
  }
  return total;
}

let t1 = performance.now();
addUpTo(1000000000);
let t2 = performance.now();
console.log(`Time Elapsed: ${(t2 - t1) / 1000} seconds.`);

Solution 2 (With Timer):

function addUpTo(n) {
  return n * (n + 1) / 2;
}

let t1 = performance.now();
addUpTo(1000000000);
let t2 = performance.now();
console.log(`Time Elapsed: ${(t2 - t1) / 1000} seconds.`);

If you run this code (Google Chrome developer console would work), then it would seem like Solution 2 is the faster solution (by a lot). However, if you run the code repeatedly you'll see that you would get different timing results for the same operations. That's because timing code in this manner isn't reliable.

The Problem with Timing Our Code

  • Different machines will record different times
  • The same machine will record different times
  • For fast algorithms, speed measurements may not be precise enough. Even the smallest unit of time measurement may not be sufficient for measuring the fastest of algorithms

So then what is the solution?

Counting Operations

Rather than counting seconds, which are so variable, let's count the number of simple operations the computer has to perform.

Let's take a look at Solution 2:

function addUpTo(n) {
  return n * (n + 1) / 2;
}

There are 3 operations in this function.

  • *: 1 multiplication
  • +: 1 addition
  • /: 1 division

The values 1 and 2 don't need any work done to them, and it doesn't really matter what n is.

Compare this to the Solution 1:

function addUpTo(n) {
  let total = 0;
  for (let i = 1; i <= n; i++) {
    total += i;
  }
  return total;
}
  • +=: n addition/assignment operations
    • In the last solution we counted the addition as one operation. But in this case, the += operation is in a loop. So if we loop 5 times, we are doing 5 addition operations, or if n is 20, there's 20 additions, and so on. And it's actually more than that, since the equals sign here is also an assignment operation.
  • ++: n addition/assignment operations
  • let total = 0: 1 assignment operation, happens once
  • let i = 1: 1 assignment operation that happens once at the beginning of the loop
  • <=: n comparison operations

This demonstrates that counting is hard and this method results in fuzzy numbers. Depending on what we count, the number of operations can be as low as 2n or as high as 5n + 2. But what's important with Big O is the trend. The number of operations grows roughly proportionally with n.

Big O Notation is a way to formalize fuzzy counting. It allows us to talk formally about how the runtime of an algorithm grows as the inputs grow. It's a way of describing the relationship between the input size of a function, and how that changes the behavior of a function's runtime and execution. What we are doing with these last two example solutions is an analysis of Time Complexity.

Time Complexity

Time Complexity is how we analyze the runtime of an algorithm as the size of the input increases.

So let's review the example solutions above through the lens of calculating each algorithm's Time Complexity.

Revisiting Example: Sum

Problem: Write a function that calculates the sum of all numbers from 1 up to and including some number n.

Solution 1:

function addUpTo(n) {
  let total = 0;
  for (let i = 1; i <= n; i++) {
    total += i;
  }
  return total;
}

Solution 2:

function addUpTo(n) {
  return n * (n + 1) / 2;
}

Solution 2: as n grows, the runtime is not dramatically affected.

Solution 1: as n grows, the runtime grows proportionally.

We say that an algorithm is O(f(n)) if the number of simple operations the computer has to do is eventually less than a constant times f(n), as n increases.

  • f(n) could be linear (f(n) = n)
  • f(n) could be quadratic (f(n) = n²)
  • f(n) could be constant(f(n) = 1)
  • f(n) could be something entirely different!

And we can shorten these using a big O:

  • f(n) of Solution 1 is linear, or Solution 1 has a O(n).
  • f(n) of Solution 2 is constant, or Solution 2 has a O(1).

Let's look at a different example.

Example: Count Up and Down

Problem: Given a positive number n, write a function that announces when it starts counting, then counts up to n starting from 0, then announces it is at the top, then counts back down to 0, and finally announces it has stopped counting down.

Solution:

function countUpAndDown(n) {
  console.log("Going up!");
  for (let i = 0; i <= n; i++) {
    console.log(i);
  }
  console.log("At the top!\nGoing down...");
  for (let j = n - 1; j >= 0; j--) {
    console.log(j);
  }
  console.log("Back down. Bye!");
}

First loop is O(n). Second loop is also O(n). You might think the Big O is 2(n), but we don't care about that. We care about the big picture. In the big picture, the number of operations is (eventually) bounded by a multiple of n. countUpAndDown() has a O(n).

Example: Print All Pairs

Problem: Given a number, n, starting from 0, count up to n in sequential pairs. For example, if n = 2, output should be [0, 0], [0, 1], [1, 0], [1, 1], [2, 0], [2, 1].

Solution:

function printAllPairs(n) {
  for (var i = 0; i < n; i++) {
    for (var j = 0; j < n; j++) {
      console.log(i, j);
    }
  }
}

The first, outer loop has O(n), and the second, inner loop also has O(n). But because it's nested, it's O(n * n) or O(n²). As n grows larger, the runtime grows roughly at the rate of n * n. printAllPairs() has a O(n²).

Space Complexity

So far we've been focusing on time complexity, but we can also use Big O Notation to analyze space complexity.

Space Complexity is much additional memory we need to allocate in order to run the code in our algorithm.

Space Complexity in JS

  • Most primitives (booleans, numbers, undefined, null) are constant space
  • Strings require O(n) space, where n is the string length
    • If the string is 50 characters, it takes up 50 times the space of a single-character string
  • Reference types are generally O(n), where n is the length (for arrays) or the number of keys (for objects)
    • If one array has a length of 4, and another array has a length of 2, the larger array takes up twice as much space

Example: Sum

function sum(arr) {
  let total = 0;
  for (let i = 0; i < arr.length; i++) {
    total += arr[i];
  }
  return total;
}

Let's break this down:

  • No matter what the array length is, we have one variable (total).
  • There's a second declaration in the for-loop (let i = 0).

No matter what the size of n, as n grows, it doesn't have an impact on the space that's taken up. We just have two variables.

That means we have constant O(1) space.

Example: Double

function double(arr) {
  let newArr = [];
  for (let i = 0; i < arr.length; i++) {
    newArr.push(2 * arr[i]);
  }
  return newArr;
}

The new array gets longer in proportion to the length of the input array. The space it takes up is directly proportional to n. It takes up O(n) space.

A Quick Word About Logarithms

So far we've encountered some of the most common complexities: O(1), O(n), O(n²). Sometimes Big O expressions involve more complex mathematical expressions, such as the logarithm.

What's a Log Again?

A Logarithm is the inverse of an exponent, like how multiplication is the inverse of division.

log₂8 = 3

The above is asking us to figure out how we exponentiate 2 to result in 8.

2^3 = 8 log₂(value) = exponent 2^exponent = value

For the work we're doing, we're going to omit the 2. So when we say log we mean log₂.

This is confusing, so here is a guiding principle: The logarithm of a number roughly measures the number of times you can divide that number by 2 before you get a value that's less than or equal to one.

Recap

There's a lot here and I appreciate you reading this far. Here's a review of the big important points:

  • To analyze the performance of an algorithm, we use Big O Notation
  • Big O Notation can give us a high level understanding of the time or space complexity of an algorithm
  • Big O Notation doesn't care about precision, only about general trends. As an input gets larger, does the time or space complexity constant or does it grow? Is that growth linear or quadratic or (heaven forbid) logarithmic?
  • The time or space complexity (as measured by Big O) depends only on the algorithm, not on the hardware used to run the algorithm
  • Big O Notation is everywhere, so get lots of practice!