What is Partitioning by Hash of Key?

As discussed in the previous blogs:

These posts identify the risk of skew and hotspots. ⚠️

Many distributed data stores use a hash function to determine the partition for a given key.

  • A good hash function takes skeweddata and makes it uniformlydistributed
    • Example
      • Say you have a 32 bit hash function that takes a string
      • Whenever you give it a new string, it returns a seemingly random number between 0 and 2 to the 32nd power of minus 1
      • Even if the input strings are similar, there hashes are evenly distributed across that range of numbers

What does it take for a hash function?

For partitioning purposes the hash functions need not to be cryptographically strong.

  • For example
    • Cassandra and MongoDB use MD5

Many programming languages have simple hash functions built in because they are used for hash tables. But.. they may not be suitable for partitioning. 

  • For example
    • In Java’s Object.hash code and Ruby’s object hash this same key may have a different hash value in different processes 

Once you have a suitable hash function for keys you can assign each  partition a range of hashes rather than a range of keys 🤔

  • Every key that falls within a partition range will be stored in that partition 👍
    • This technique is good at distributing keys fairly across the partitions ✅
    • The partition boundaries can be evenly spaced ✅

Or…

  • They can be chosen sudo randomly in which case the technique is sometime called consistent hashing…

📚 Further Reading & Related Topics

If you’re exploring partitioning by hash of key in distributed systems, these related articles will provide deeper insights:

• Understanding Partitioning Proportional to Nodes – Learn how partitioning strategies based on hash keys differ from proportional partitioning and their impact on data distribution and retrieval.

• How Does Partitioning Work When Requests Are Being Routed? – Explore how request routing interacts with partitioning strategies like hash-based partitioning, ensuring efficient data access in distributed systems.

4 responses to “What is Partitioning by Hash of Key?”

  1. Partitioning Secondary Indexes by Term – What is it? – Scalable Human Avatar

    […] Where as partitioning on the hash of the term this gives a more even distribution of load (as explain in earlier blog on hashing) […]

    Like

  2. What is Dynamic Partitioning? – Scalable Human Avatar

    […] equally be used with hash partitioned data. MongoDB (since v2.4) supports both key range and hash partitioning and it split partitions […]

    Like

  3. Understanding Partitioning Proportional to Nodes – Scalable Human Avatar

    […] Requires the use of hash based partitioning […]

    Like

Leave a reply to What is Dynamic Partitioning? – Scalable Human Cancel reply

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect