I help my clients acquire new users and make more money with their web businesses. I have ten years of experience with SaaS projects. If that’s something you need help with, we should get in touch!

< Back to article list

Preparing Adventofcode in Python 3/3, algorithms and data structures

November 29, 2021

This is the last part of the “Getting ready for Adventofcode in Python” series.

We’ll look at which algorithms and data structures we need to get into competitive programming. AoC is not easy, but it’s way easier than a lot of other challenges like TopCoder so it’s a good starting point.

This article itself has 3 parts:

Things you should already know
Things that pop up every year
Things that you may need

Things you should already know

You will almost certainly use the basic data structures:

Arrays
Strings
Hashmap (dict in Python)
Stacks (LIFO, last in first out) and Queues (FIFO)
sets and set operations: union, intersection, difference

The odds are you already use most of those, but maybe you have to lookup a couple things.

You should know, for each of these data structures:

what they can do
when to use them
how to use them in Python

The last part is critical ; you don’t want to spend too much time looking up basic things about the trivial part of the implementation when 10mn of research ahead of time could solve it for the duration of the contest.

As an example, here is quick recap about python’s dict:

d = {'user_id': 123, 'email': "bidou@yopmail.com"}
d.items()  # [('user_id', 123), ('email', 'bidou@yopmail.com')]
d.keys()  # ['user_id', 'email']
d.values()  # [123, 'bidou@yopmail.com']
d.get("birth_year", 1970) # returns the value for key birth_year if any, or 1970. 1970 here
d.clear()  # d = {}

# defaultdict have a default value when a key is missing 
from collections import defaultdict

As you can see it’s not complicated, you probably already do that often for the frequent data structures. As part of a preparation, better do it beforehand for the less frequent data structures, like set.

As for arrays and string, it can be useful to know how search/insert/erase an item. Here are strings:

# find returns the index of the string, or -1 if not found
>>> "plop".find("n")
-1
>>> "plop".find("o")
2
# appending to a string is done with the + operator
>>> s2 = s+"pouet"
>>> s2
'ploppouet'
# You can replace/delete with replace. It does not modify the string in place
s = "plop"
>>> s.replace("p", "g")
'glog'
>>> s
'plop'

Things that come up every year

Every year there are problems about:

Bruteforcing
Recursion
Backtracking
Graphs and Trees representation and traversal
Shortest path (Dijkstra)
Parsing and running a small interpreted language
Dynamic Programming

Let’s dig a bit.

Bruteforcing

Some problems lend themselves to bruteforce. It may not be super efficient: 2020 day 15 involved the Van Ecke sequence that my bruteforce solution solved in 20s, but… it gets the job done and a more ad hoc solution would involve more brain time spent on the matter.

In order to bruteforce efficiently:

learn to estimate time complexity of your algorithm with “Big O notation”. Is it O(n)? O(n^2) ? O(n^3)? O(n*m)? O(log(n))? You don’t need to be very accurate, just to get a feel for their relative speed.
Sometimes you can estimate how many iterations will be needed, but sometimes you can also try and sample, say, 1% of the problem and see if it takes a reasonable time to run. You don’t care about a general solution, all you care about is solving this instance.
learn how to generate many possibilities (see itertools: permutations, combinations, product…)

Recursion

Getting familiar with recursion is non-trivial.

As a starting point, try to write a few popular algorithms with recursion (eg the fibonacci sequence), then try to work out:

an iterative solution
a visualisation of the recursion tree
counting how often you repeat some steps. Yeah, that’s the right moment for some memoization ;)

Recursion has a lot of usage with number-theoretical problems and graphs. No need to read a lot of theory, just get a feel for it.

Graphs and Trees, and graph and tree traversal

There often are some kind of trees and graphs and knowing how to run some of the classic algorithms on them is a must.

Breadth-first search

Breadth-first search is a tree traversal algorithm. It was hard for me a few years ago, but having implemented it many times since my student years I finally got the logic. Here is a sample implementation heavily inspired from wikipedia:

from collections import deque

edges = {
    "a": ["b", "c"], # from edge a, it is possible to go to b or c
    "b": ["d", "e"],
    "c": ["f", "g"],
    "d": [],
    "e": ["h"],
    "f": [],
    "g": []
}

# Sample code from
# https://en.wikipedia.org/wiki/Breadth-first_search#Pseudocode
# Is it possible to go from root to goal ?
def bfs(G, root, goal):
    Q = deque()
    SEEN = [root]
    Q.appendleft(root)
    while not len(Q) == 0:
        v = Q.popleft()
        if v == goal:
            return True
        if v in G:
            for w in G[v]: # for all neighbor vertices w of v
                if w not in SEEN:
                    SEEN.append(w)
                    Q.appendleft(w)
    return False

print(bfs(edges, "a", "h"))
print(bfs(edges, "a", "c"))
print(bfs(edges, "b", "c"))

Depth-first search

DFS is similar, the difference is that we navigate differently inside the tree.

# iterative DFS:
# https://en.wikipedia.org/wiki/Depth-first_search#Pseudocode
def dfs(G, root, goal):
    S = [root]
    SEEN = []
    while len(S) > 0:
        v = S.pop()
        if v == goal:
            return True
        if v in G and v not in SEEN:
            SEEN.append(v)
            for w in G[v]: # for all neighbor vertices w of v
                S.append(w)
    return False

print(dfs(edges, "a", "h"))
print(dfs(edges, "a", "c"))
print(dfs(edges, "b", "c"))

Backtracking

Backtracking is a popular algorithm for bruteforcing the search in a tree, while discarding huge parts of the tree when some candidates will not yield a valid solution. There are a lot of variations for pruning the search tree. Give it a try at some point to get a feel for it:

https://en.wikipedia.org/wiki/Backtracking#Pseudocode

Shortest Path in a graph - Dijkstra

Sometimes tree traversal is not enough, you need to search for a shortest path, and Dijkstra’s algorithm comes to mind! I love this one, it’s short and elegant.

Parsing an running a small interpreted language

There are at least one small programming language to parse and run, so learn what it takes to do that: what is a virtual machine and its memory ? What is an instruction, how could you represent it, how do you store its parameters?

You may also want to have a look at the Shunting-yard algorithm

Dynamic programming

Dynamic programming is really not trivial to me.

Some problems are way too huge to be bruteforced, and many paths will recompute data that have already been computed.

A popular example is the recursive implementation of the fibonacci sequence, that involves many calls to the same values. With memoization, you can get rid of entire call trees: https://medium.com/geekculture/how-to-solve-fibonacci-sequence-using-dynamic-programming-b7cd784ee10d#f98d (turns out there are even better solutions, but that’s not the topic )

If you want to go deeper with more complex examples, here is Jonathan Paulson who:

explains it as part of part 2 of Advent of Code 2020 - Day 10
has an entire series of videos on the subject for competitive programming

Things you may need

OK, here we are in uncharted territory. You may need this, or you may not. Study at your own risk! The adventure starts here, you’re on your own.

Integer Maths. Some problems are math focused and a math property will help to have a fast solution, project euler style.
Bit manipulation. Do you understand AND, OR, XOR… on single bits, on entire bytes? Can you code them in Python?
Greedy Algorithms (eg Huffman coding, Dijkstra)
Popular algorithms: there can be problems where you’ll have to hack a classic algorithm to suit it to a different pattern. There are so many possible things it’s hard to know what to study, here are a few nice ones:
- KMP: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
- Levenstein distance: https://en.wikipedia.org/wiki/Levenshtein_distance
in fact, at some point it may be worth it to read all of cp-algorithms