I faced a fun problem this week at work: organizing tasks in threads.
My situation:
After some research, this is a known scheduling problem. In fact, it’s a variant of the bin-packing problem called the multiway number partitioning problem
. It’s a classic optimization problem.
Here is Wikipedia’s description:
Minimize the largest sum. There are k identical processors, and each number in S represents the time required to complete a single-processor job. The goal is to partition the jobs among the processors such that the makespan (the finish time of the last job) is minimized.
This is in fact a NP-hard problem, so forget the idea of finding the optimal solution in polynomial time.
It turns out there are multiple algorithms to find nice solutions with a reasonable execution time. We’ll talk about 2 I implemented.
The greedy approach is a simple and widely used algorithm for solving the bin packing problem. It involves sorting the objects in decreasing or increasing order of their size and then packing them into containers one by one, starting with the largest or smallest object.
Here is a sample python implementation of this approach I wrote:
import random
from data import timings
NB_BUCKETS = 6
from heapq import heapify, heappop
buckets = [0 for _ in range(NB_BUCKETS)]
buckets_names = [[] for _ in range(NB_BUCKETS)]
queue = [(v, t) for (t, v) in timings.items()]
heapify(queue)
while len(queue) > 0:
v, t = heappop(queue)
min_cost = min(buckets)
bucket_id = buckets.index(min_cost)
buckets[bucket_id] += v
buckets_names[bucket_id].append(t)
import pprint as pp
for s, n in zip(buckets, buckets_names):
print(f"{s}\t: {', '.join(n)}")
This is easy to implement, sorting (here using a heap) an O(n log n) complexity. The problem of this approach is that the quality of the result relies a lot on the quality of the data.
The sorting heuristic can greatly affect the efficiency of the algorithm, and there are two common variants:
In order to best understand the various consequences, there are some diagrams in another article.
The multifit algorithm is another popular approach for solving the bin packing problem.
This idea is to:
For more details, I implemented what Wikipedia describes:
import random
from data import timings
import pprint as pp
from heapq import heapify, heappop
NB_BUCKETS = 6
buckets = [0 for _ in range(NB_BUCKETS)]
buckets_names = [[] for _ in range(NB_BUCKETS)]
def first_fit_decreasing(items, bin_capacity):
# Sort the items in decreasing order of size
sorted_items = sorted(items, reverse=True)
# Initialize the list of bins with the first item
bins = [[sorted_items[0]]]
# Iterate over the remaining items
for item in sorted_items[1:]:
# Try to fit the item into an existing bin
for bin in bins:
if sum(bin) + item <= bin_capacity:
bin.append(item)
break
else:
# If the item doesn't fit in any existing bin, create a new bin
bins.append([item])
return bins
def multifit(items, nb_buckets, k=3):
"""
Implementation of the Multifit algorithm described here:
https://en.wikipedia.org/wiki/Multifit_algorithm#The_algorithm
"""
L = max(sum(items) / nb_buckets, max(items))
U = max(2*sum(items) / nb_buckets, max(items))
for i in range(k):
c = (L+U)//2
# print("LUC", L, U, c)
bins = first_fit_decreasing(items, c)
if len(bins) > nb_buckets:
L = c
else:
U = c
return first_fit_decreasing(items, U)
bins = multifit(timings.values(), NB_BUCKETS, 10)
pp.pprint(bins)
for b in bins:
s = sum(b)
print(s)
That was fun to implement ! The multiway number partitioning problem is a challenging optimization problem that has numerous applications in real-world scenarios.
The greedy approach, largest difference algorithm, and multifit algorithm are three widely used algorithms to solve this problem.
Each algorithm has its strengths and weaknesses, and the choice of algorithm depends on the specific constraints and objectives of the problem at hand. However, with these algorithms, we can achieve efficient partitioning solutions and optimize resource utilization.
In the end, we did not use my implementation and left the data randomly sorted -_-