I fell into a rabbit hole recently. Efficient hashmaps, bloom filters… they all need an efficient hashing algorithm. Often you don’t need the hash to be cryptographically secure, but it has to be super fast. What can you use ? One of the option is MurmurHash
.
The initial implementation can be found in SMHasher, whose purpose is to analyze the properties of various hashing functions.
The MurmurHash algorithm is a fast and efficient hash function that is commonly used for fast hash-based lookups, like in bloom filters, HyperLogLog, and some others.
The name comes from the primitives it uses: MUltiply, Rotate, MUltiply, Rotate. Most of the time, it operates of 4 bytes chunk and perform those operations:
# wikipedia
k ← fourByteChunk
k ← k × c1
k ← k ROL r1
k ← k × c2
hash ← hash XOR k
hash ← hash ROL r2
hash ← (hash × m) + n
It’s not cryptographically secure, and it has DoS attacks so be careful.
The hashing algorithm has 3 main steps:
I found this video and the diagrams here helpful to understand what is happening.
Here’s an implementation of the 32-bit MurmurHash algorithm in Rust:
fn murmur32(key: &[u8], seed: u32) -> u32 {
const C1: u32 = 0xcc9e2d51;
const C2: u32 = 0x1b873593;
const R1: u32 = 15;
const R2: u32 = 13;
const M: u32 = 5;
const N: u32 = 0xe6546b64;
let mut hash: u32 = seed ^ (key.len() as u32);
let mut i = 0;
// first step: take care of data in groups of 4 bytes
while i + 4 <= key.len() {
let mut k: u32 = u32::from_le_bytes([
key[i],
key[i + 1],
key[i + 2],
key[i + 3],
]);
k = k.wrapping_mul(C1);
k = k.rotate_left(R1);
k = k.wrapping_mul(C2);
hash ^= k;
hash = hash.rotate_left(R2);
hash = hash.wrapping_mul(M).wrapping_add(N);
i += 4;
}
// second step: take care of the remaining 1 to 3 bytes, if any
if i < key.len() {
let mut k: u32 = 0;
let mut j = 0;
while i < key.len() {
k ^= u32::from(key[i]) << j;
i += 1;
j += 8;
}
k = k.wrapping_mul(C1);
k = k.rotate_left(R1);
k = k.wrapping_mul(C2);
hash ^= k;
}
// as a last step, a final scramble
hash ^= key.len() as u32;
hash ^= hash >> 16;
hash = hash.wrapping_mul(0x85ebca6b);
hash ^= hash >> 13;
hash = hash.wrapping_mul(0xc2b2ae35);
hash ^= hash >> 16;
hash
}
To use this function, you simply need to pass in a byte slice containing the data you want to hash, and a seed value (which can be any 32-bit value you choose). Here’s an example of how to call the function:
fn main() {
let data = b"Hello, world!";
let seed = 123456789;
let hash = murmur32(data, seed);
println!("Hash: {:08x}", hash);
}
The MurmurHash algorithm processes the input key in 4-byte chunks. For each chunk, it multiplies the chunk by two constants (C1 and C2), rotates it left by a certain number of bits (R1), and XORs the result with the hash value (initially set to the seed value). After processing all the chunks, it performs a few final operations on the hash value to produce the final hash.
The MurmurHash algorithm is a well-designed hash function that strikes a good balance between speed, efficiency, and distribution properties and it was fun to investigate how it works. I hope you learned a thing or two here!