What is Consistent Hashing?

As defined by the author of Consistent Hashing and Random Trees, Distributed hashing protocols for Relieving Hotspots on the World Wide Web, consistent hashing is a way of evenly distributing load across an internet wide system of caches.

Example of this?

👉 Content delivery network (CDN)

A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon’s CloudFront support dynamic content. The site’s DNS resolution will tell clients which server to contact.”

https://github.com/donnemartin/system-design-primer
  • It uses randomly chosen partition boundaries to avoid central control

Or..

  • Distributed consensus!

Is this like Replication and ACID consistency?

⚠️ No – consistent hashing has nothing to do with replica consistency or ACID consistency.

What is it then?

✅ This rather describes the approach of rebalancing.

❌ This does not work well with databases so it is rarely used in practice…

🧐 Documentation of some databases still refer to consistent hashing… (but this is often inaccurate)

Better to avoid the term consistent hashing, call it hash partitioning instead.

Martin Kleppmann – Designing-Data Intensive Applications

Unfortunately however by using the hash of key for partitioning purposes, subsequently we lose the useful property of key range partitioning. 👎 (The ability to do efficient range queries)

Keys that were once adjacent are now scattered across all the partitions.

  • So there sort order is lost…
  • In MongoDB if you have enabled hash based shardingmode
  • Range keys on the primary are not supported by Riak, Couchbase or Voldemort
  • Cassandra achieves a compromise between two partitioning strategies
    • Table in Cassandra can be declared with a compound primary key consisting of several columns
    • Only the first part of that key is hashed to determine the partition
      • But other columns are used as a concatenated index for sorting the data in Cassandra‘s SSTables
      • A query cannot search for a range of values within the first column of a compound key
      • But if it specifies a fixed value for the first column it can perform an efficient range scan over the other columns of the key 👍

Why to consider the Concatenated Index Approach?

This enables an elegant data model for one-to-many relationships. ✅

  • For example
    • On a social media site, one user may post may updates
    • If a primary key for the updates is chosen to be a:
      • User ID
      • Update timestamp
    • Then you can efficiently retrieve all updates made by particular user within some time interval (sorted by timestamp)
    • Different users maybe stored on different partitions 🤔
      • But.. within each user updates are stored ordered by timestamp on a single partition 😊

📚 Further Reading & Related Topics

If you’re exploring consistent hashing and its role in distributed systems, these related articles will provide deeper insights:

• Understanding Partitioning Proportional to Nodes – Learn how partitioning strategies work in distributed systems and how consistent hashing improves scalability.

• How Does Partitioning Work When Requests Are Being Routed? – Explore how different partitioning mechanisms impact load distribution and system efficiency.

5 responses to “What is Consistent Hashing?”

  1. How to Relieve Hotspots with Skewed Workloads – Scalable Human Avatar

    […] discussed in the previous post What is Consistent Hashing?, hashing a key to determine its partition can aid in reducing […]

    Like

  2. Understanding Partitioning Proportional to Nodes – Scalable Human Avatar

    […] this approach corresponds most closely with the original definition of consistent hashing (mentioned earlier in the blog). New hash functions can achieve a similar affect with lower […]

    Like

  3. What to Consider with Secondary Indexes and Partitioning – Scalable Human Blog Avatar

    […] • What Is Consistent Hashing? – Explore how consistent hashing is used in partitioning and indexing strategies to ensure efficient data distribution and retrieval in large-scale systems. […]

    Like

  4. Load shedding In Software Systems – Scalable Human Blog Avatar

    […] uneven load distribution across systems, a key challenge addressed by load shedding strategies. • What Is Consistent Hashing? – Learn how consistent hashing helps distribute requests more evenly across nodes, minimizing […]

    Like

Leave a reply to Load shedding In Software Systems – Scalable Human Blog Cancel reply

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect