What is Partitioning by Hash of Key?

Software Engineering

What is Partitioning by Hash of Key?

Published by

Sean Mayer

15th May 2022

As discussed in the previous blogs:

What is Partitioning By Key Range?

What is Partitioning of Key-Value Data?

These posts identify the risk of skew and hotspots. ⚠️

Many distributed data stores use a hash function to determine the partition for a given key.

A good hash function takes skeweddata and makes it uniformlydistributed ✅
- Example
  - Say you have a 32 bit hash function that takes a string
  - Whenever you give it a new string, it returns a seemingly random number between 0 and 2 to the 32nd power of minus 1
  - Even if the input strings are similar, there hashes are evenly distributed across that range of numbers

What does it take for a hash function?

For partitioning purposes the hash functions need not to be cryptographically strong.

For example
- Cassandra and MongoDB use MD5

Many programming languages have simple hash functions built in because they are used for hash tables. But.. they may not be suitable for partitioning.

For example
- In Java’s Object.hash code and Ruby’s object hash this same key may have a different hash value in different processes

Once you have a suitable hash function for keys you can assign each partition a range of hashes rather than a range of keys 🤔

Every key that falls within a partition range will be stored in that partition 👍
- This technique is good at distributing keys fairly across the partitions ✅
- The partition boundaries can be evenly spaced ✅

Or…

They can be chosen sudo randomly in which case the technique is sometime called consistent hashing…

📚 Further Reading & Related Topics

If you’re exploring partitioning by hash of key in distributed systems, these related articles will provide deeper insights:

• Understanding Partitioning Proportional to Nodes – Learn how partitioning strategies based on hash keys differ from proportional partitioning and their impact on data distribution and retrieval.

• How Does Partitioning Work When Requests Are Being Routed? – Explore how request routing interacts with partitioning strategies like hash-based partitioning, ensuring efficient data access in distributed systems.

4 responses to “What is Partitioning by Hash of Key?”

Partitioning Secondary Indexes by Term – What is it? – Scalable Human

26th June 2022

[…] Where as partitioning on the hash of the term this gives a more even distribution of load (as explain in earlier blog on hashing) […]

LikeLike

Reply
What is Dynamic Partitioning? – Scalable Human

24th July 2022

[…] equally be used with hash partitioned data. MongoDB (since v2.4) supports both key range and hash partitioning and it split partitions […]

LikeLike

Reply
Understanding Partitioning Proportional to Nodes – Scalable Human

6th August 2022

[…] Requires the use of hash based partitioning […]

LikeLike

Reply
What is Partitioning and Why is it Important? – Scalable Human

4th September 2022

[…] Hash partitioning […]

LikeLike

Reply

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect

Join the mailing list

Stay updated with latest posts.

Scalable Human Blog