I help my clients acquire new users and make more money with their web businesses. I have ten years of experience with SaaS projects. If that’s something you need help with, we should get in touch!

< Back to article list

The multiway number partitioning problem

March 31, 2023

I faced a fun problem this week at work: organizing tasks in threads.

My situation:

I know the (average) length of many tasks I have to send to a known amount of threads.
What task should I send to which thread, in order to minimize the global execution time?
said differently: how do I organize the tasks so that the maximum execution time among all threads is minimized?

A known problem

After some research, this is a known scheduling problem. In fact, it’s a variant of the bin-packing problem called the multiway number partitioning problem. It’s a classic optimization problem.

Here is Wikipedia’s description:

Minimize the largest sum. There are k identical processors, and each number in S represents the time required to complete a single-processor job. The goal is to partition the jobs among the processors such that the makespan (the finish time of the last job) is minimized.

This is in fact a NP-hard problem, so forget the idea of finding the optimal solution in polynomial time.

It turns out there are multiple algorithms to find nice solutions with a reasonable execution time. We’ll talk about 2 I implemented.

The Greedy Approach

The greedy approach is a simple and widely used algorithm for solving the bin packing problem. It involves sorting the objects in decreasing or increasing order of their size and then packing them into containers one by one, starting with the largest or smallest object.

Here is a sample python implementation of this approach I wrote:

import random
from data import timings
NB_BUCKETS = 6
from heapq import heapify, heappop
buckets = [0 for _ in range(NB_BUCKETS)]
buckets_names = [[] for _ in range(NB_BUCKETS)]
queue = [(v, t) for (t, v) in timings.items()]
heapify(queue)

while len(queue) > 0:
    v, t = heappop(queue)
    min_cost = min(buckets)
    bucket_id = buckets.index(min_cost)
    buckets[bucket_id] += v
    buckets_names[bucket_id].append(t)

import pprint as pp

for s, n in zip(buckets, buckets_names):
    print(f"{s}\t: {', '.join(n)}")

This is easy to implement, sorting (here using a heap) an O(n log n) complexity. The problem of this approach is that the quality of the result relies a lot on the quality of the data.

The sorting heuristic can greatly affect the efficiency of the algorithm, and there are two common variants:

lower to bigger sorting heuristic: the algorithm packs the objects from the smallest to the largest until there is no more room in the current container.
bigger to lower sorting heuristic: the algorithm packs the objects from the largest to the smallest until there is no more room in the current container.

In order to best understand the various consequences, there are some diagrams in another article.

The Multifit Algorithm

The multifit algorithm is another popular approach for solving the bin packing problem.

This idea is to:

use binary search to fix a size limit between two bounds
use the first fit decreasing approach with said size limit in order to update the bounds of the binary search

For more details, I implemented what Wikipedia describes:

import random
from data import timings
import pprint as pp
from heapq import heapify, heappop
NB_BUCKETS = 6
buckets = [0 for _ in range(NB_BUCKETS)]
buckets_names = [[] for _ in range(NB_BUCKETS)]


def first_fit_decreasing(items, bin_capacity):
    # Sort the items in decreasing order of size
    sorted_items = sorted(items, reverse=True)
    
    # Initialize the list of bins with the first item
    bins = [[sorted_items[0]]]
    
    # Iterate over the remaining items
    for item in sorted_items[1:]:
        # Try to fit the item into an existing bin
        for bin in bins:
            if sum(bin) + item <= bin_capacity:
                bin.append(item)
                break
        else:
            # If the item doesn't fit in any existing bin, create a new bin
            bins.append([item])
    
    return bins


def multifit(items, nb_buckets, k=3):
    """
    Implementation of the Multifit algorithm described here:
    https://en.wikipedia.org/wiki/Multifit_algorithm#The_algorithm
    """
    L = max(sum(items) / nb_buckets, max(items))
    U = max(2*sum(items) / nb_buckets, max(items))
    for i in range(k):
        c = (L+U)//2
        # print("LUC", L, U, c)
        bins = first_fit_decreasing(items, c)
        if len(bins) > nb_buckets:
            L = c
        else:
            U = c
    return first_fit_decreasing(items, U)

bins = multifit(timings.values(), NB_BUCKETS, 10)
pp.pprint(bins)
for b in bins:
    s = sum(b)
    print(s)

Conclusion

That was fun to implement ! The multiway number partitioning problem is a challenging optimization problem that has numerous applications in real-world scenarios.

The greedy approach, largest difference algorithm, and multifit algorithm are three widely used algorithms to solve this problem.

Each algorithm has its strengths and weaknesses, and the choice of algorithm depends on the specific constraints and objectives of the problem at hand. However, with these algorithms, we can achieve efficient partitioning solutions and optimize resource utilization.

In the end, we did not use my implementation and left the data randomly sorted -_-