KeiruaProd

I help my clients acquire new users and make more money with their web businesses. I have ten years of experience with SaaS projects. If that’s something you need help with, we should get in touch!
< Back to article list

Preparing Adventofcode in Python 1/3, parsing

Adventofcode is about to get started in a few days, on december 1st. It’s an advent calendar for programmers with one programming challenge every day. Each challeng has 2 parts, and you need to solve the first part in order to unlock the other.

It started in 2015, and it is the most accessible competitive programming challenge out there, with a growing popularity, and a very active subreddit when it is on.

Simply solving each puzzles is already quite an achievement. If you never tried, you’ll learn a ton of things ! There are A LOT of other ways to approach this challenge:

I’ve been “playing” since 2016. Last year I published a writeup of the things I learnt during the 2020 edition.

In this series of articles, I want to show you how to take some shortcuts. It will help you with a couple things:

  1. Parsing the puzzles efficiently
  2. Cool tools from Python’s standard library
  3. Useful algorithms and data structures for competitive programming

Parsing the input data

Many problems have a similar structure, so it is nice to know some patterns in order to quickly deal with the input data and focus on solving the puzzle.

Write your own helpers

Most challenges come as an input file that contain:

Let’s write some helpers so that we solve this part once and we can reuse it everywhere:

# aoc.py
from typing import List

def input_as_string(filename:str) -> str:
    """returns the content of the input file as a string"""
    with open(filename) as f:
        return f.read().rstrip("\n")

def input_as_lines(filename:str) -> List[str]:
    """Return a list where each line in the input file is an element of the list"""
    return input_as_string(filename).split("\n")

def input_as_ints(filename:str) -> List[int]:
    """Return a list where each line in the input file is an element of the list, converted into an integer"""
    lines = input_as_lines(filename)
    line_as_int = lambda l: int(l.rstrip('\n'))
    return list(map(line_as_int, lines))

There are a few other frequent patterns (data is a grid, a set of coordinates, and so on), but I do not want to spoil all the fun nor to provide a solution to all the problems. Explore past puzzles to get a grasp on the frequent patterns, write your own library − this is useful to learn the language and to master your tools. Many times, you’ll be able to reuse something you wrote for another challenge with a few modifications.

Learn to use regexes

Speaking of patterns ! Most of the time, you don’t need to go through all the characters in order to extract the relevant data. The regex module (re) of the standard library will do wonder.

Let’s say the idea is to extract all the directions (NSEW) and integers in a comma-separated list that looks like this (2017, day 7). That’s a list of direction and distances:

s = "N12, S1, W4"

One option is to:

That’s fine, but a faster and more versatile approach is to use regexes:

import re
matches = re.findall(r'([NSEW])(\d+)', s)
print(matches) # [('N', '12'), ('S', '1'), ('W', '4')]

Boom, done. You can convert the distance with a list comprehension, and it’s still a one liner:

import re
matches = [(dir, int(dist)) for (dir, dist) in re.findall(r'([NSEW])(\d+)', s)]
# matches = [('N', 12), ('S', 1), ('W', 4)]`

You can do that with splitting, but in my experience having regexes is a brutal but effecient way to approach seemingly complex parsing problems.

Same goes for a list: sometimes, you don’t have to split. You can look for the useful part with a regex right away:

# Finding numbers
s = "123, 4, 56, 789"
list_as_str = re.findall(r'(\d+)', s)
print(list_as_str) # ['123', '4', '56', '789']
array_of_ints = list(map(int, list_as_str)) # [123, 4, 56, 789]
# or if you prefer list comprehensions:
array_of_ints = [int(n) for n in list_as_str]
print(array_of_ints) # [123, 4, 56, 789]

Sometimes, str.split is not enough when you need to split a string according to a somewhat complex pattern (2016, day 7: you need to split by [ or ]). re.split comes to the rescue:

re.split(r'\[|\]', "abc[def]ghi") # returns ['abc', 'def', 'ghi']

Same for 2018, day 7: the input data looks like this.

Step A must be finished before step N can begin.
Step P must be finished before step R can begin.
Step O must be finished before step T can begin.
…

You can split by lines, find the indices of the necessary information, but… a little findall and you are done!

s = """Step A must be finished before step N can begin.
Step P must be finished before step R can begin.
Step O must be finished before step T can begin."""
data = re.findall("Step (.*) must be finished before step (.*) can begin.", s)
print(data) # [('A', 'N'), ('P', 'R'), ('O', 'T')]

That’s it for today!