This is the last part of the “Getting ready for Adventofcode in Python” series.
We’ll look at which algorithms and data structures we need to get into competitive programming. AoC is not easy, but it’s way easier than a lot of other challenges like TopCoder so it’s a good starting point.
This article itself has 3 parts:
You will almost certainly use the basic data structures:
dict
in Python)set
s and set operations: union, intersection, differenceThe odds are you already use most of those, but maybe you have to lookup a couple things.
You should know, for each of these data structures:
The last part is critical ; you don’t want to spend too much time looking up basic things about the trivial part of the implementation when 10mn of research ahead of time could solve it for the duration of the contest.
As an example, here is quick recap about python’s dict:
d = {'user_id': 123, 'email': "bidou@yopmail.com"}
d.items() # [('user_id', 123), ('email', 'bidou@yopmail.com')]
d.keys() # ['user_id', 'email']
d.values() # [123, 'bidou@yopmail.com']
d.get("birth_year", 1970) # returns the value for key birth_year if any, or 1970. 1970 here
d.clear() # d = {}
# defaultdict have a default value when a key is missing
from collections import defaultdict
As you can see it’s not complicated, you probably already do that often for the frequent data structures. As part of a preparation, better do it beforehand for the less frequent data structures, like set
.
As for arrays and string, it can be useful to know how search/insert/erase an item. Here are strings:
# find returns the index of the string, or -1 if not found
>>> "plop".find("n")
-1
>>> "plop".find("o")
2
# appending to a string is done with the + operator
>>> s2 = s+"pouet"
>>> s2
'ploppouet'
# You can replace/delete with replace. It does not modify the string in place
s = "plop"
>>> s.replace("p", "g")
'glog'
>>> s
'plop'
Every year there are problems about:
Let’s dig a bit.
Some problems lend themselves to bruteforce. It may not be super efficient: 2020 day 15 involved the Van Ecke sequence that my bruteforce solution solved in 20s, but… it gets the job done and a more ad hoc solution would involve more brain time spent on the matter.
In order to bruteforce efficiently:
O(n)
? O(n^2)
? O(n^3)
? O(n*m)
? O(log(n))
? You don’t need to be very accurate, just to get a feel for their relative speed.permutations
, combinations
, product
…)Getting familiar with recursion is non-trivial.
As a starting point, try to write a few popular algorithms with recursion (eg the fibonacci sequence), then try to work out:
Recursion has a lot of usage with number-theoretical problems and graphs. No need to read a lot of theory, just get a feel for it.
There often are some kind of trees and graphs and knowing how to run some of the classic algorithms on them is a must.
Breadth-first search is a tree traversal algorithm. It was hard for me a few years ago, but having implemented it many times since my student years I finally got the logic. Here is a sample implementation heavily inspired from wikipedia:
from collections import deque
edges = {
"a": ["b", "c"], # from edge a, it is possible to go to b or c
"b": ["d", "e"],
"c": ["f", "g"],
"d": [],
"e": ["h"],
"f": [],
"g": []
}
# Sample code from
# https://en.wikipedia.org/wiki/Breadth-first_search#Pseudocode
# Is it possible to go from root to goal ?
def bfs(G, root, goal):
Q = deque()
SEEN = [root]
Q.appendleft(root)
while not len(Q) == 0:
v = Q.popleft()
if v == goal:
return True
if v in G:
for w in G[v]: # for all neighbor vertices w of v
if w not in SEEN:
SEEN.append(w)
Q.appendleft(w)
return False
print(bfs(edges, "a", "h"))
print(bfs(edges, "a", "c"))
print(bfs(edges, "b", "c"))
DFS is similar, the difference is that we navigate differently inside the tree.
# iterative DFS:
# https://en.wikipedia.org/wiki/Depth-first_search#Pseudocode
def dfs(G, root, goal):
S = [root]
SEEN = []
while len(S) > 0:
v = S.pop()
if v == goal:
return True
if v in G and v not in SEEN:
SEEN.append(v)
for w in G[v]: # for all neighbor vertices w of v
S.append(w)
return False
print(dfs(edges, "a", "h"))
print(dfs(edges, "a", "c"))
print(dfs(edges, "b", "c"))
Backtracking is a popular algorithm for bruteforcing the search in a tree, while discarding huge parts of the tree when some candidates will not yield a valid solution. There are a lot of variations for pruning the search tree. Give it a try at some point to get a feel for it:
https://en.wikipedia.org/wiki/Backtracking#Pseudocode
Sometimes tree traversal is not enough, you need to search for a shortest path, and Dijkstra’s algorithm comes to mind! I love this one, it’s short and elegant.
There are at least one small programming language to parse and run, so learn what it takes to do that: what is a virtual machine and its memory ? What is an instruction, how could you represent it, how do you store its parameters?
You may also want to have a look at the Shunting-yard algorithm
Dynamic programming is really not trivial to me.
Some problems are way too huge to be bruteforced, and many paths will recompute data that have already been computed.
A popular example is the recursive implementation of the fibonacci sequence, that involves many calls to the same values. With memoization
, you can get rid of entire call trees:
https://medium.com/geekculture/how-to-solve-fibonacci-sequence-using-dynamic-programming-b7cd784ee10d#f98d
(turns out there are even better solutions, but that’s not the topic )
If you want to go deeper with more complex examples, here is Jonathan Paulson who:
OK, here we are in uncharted territory. You may need this, or you may not. Study at your own risk! The adventure starts here, you’re on your own.