As defined by the author of Consistent Hashing and Random Trees, Distributed hashing protocols for Relieving Hotspots on the World Wide Web, consistent hashing is a way of evenly distributing load across an internet wide system of caches.
Example of this?
👉 Content delivery network (CDN)
“A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN, although some CDNs such as Amazon’s CloudFront support dynamic content. The site’s DNS resolution will tell clients which server to contact.”
https://github.com/donnemartin/system-design-primer
- It uses randomly chosen partition boundaries to avoid central control
Or..
- Distributed consensus!
Is this like Replication and ACID consistency?
⚠️ No – consistent hashing has nothing to do with replica consistency or ACID consistency.
What is it then?
✅ This rather describes the approach of rebalancing.
❌ This does not work well with databases so it is rarely used in practice…
🧐 Documentation of some databases still refer to consistent hashing… (but this is often inaccurate)
Better to avoid the term consistent hashing, call it hash partitioning instead.
Martin Kleppmann – Designing-Data Intensive Applications
Unfortunately however by using the hash of key for partitioning purposes, subsequently we lose the useful property of key range partitioning. 👎 (The ability to do efficient range queries)
Keys that were once adjacent are now scattered across all the partitions.
- So there sort order is lost…
- In MongoDB if you have enabled hash based shardingmode
- Any range query has to be sent to all partitions
- Range keys on the primary are not supported by Riak, Couchbase or Voldemort
- Cassandra achieves a compromise between two partitioning strategies
- Table in Cassandra can be declared with a compound primary key consisting of several columns
- Only the first part of that key is hashed to determine the partition
- But other columns are used as a concatenated index for sorting the data in Cassandra‘s SSTables
- A query cannot search for a range of values within the first column of a compound key
- But if it specifies a fixed value for the first column it can perform an efficient range scan over the other columns of the key 👍
Why to consider the Concatenated Index Approach?
This enables an elegant data model for one-to-many relationships. ✅
- For example
- On a social media site, one user may post may updates
- If a primary key for the updates is chosen to be a:
- User ID
- Update timestamp
- Then you can efficiently retrieve all updates made by particular user within some time interval (sorted by timestamp)
- Different users maybe stored on different partitions 🤔
- But.. within each user updates are stored ordered by timestamp on a single partition 😊
📚 Further Reading & Related Topics
If you’re exploring consistent hashing and its role in distributed systems, these related articles will provide deeper insights:
• Understanding Partitioning Proportional to Nodes – Learn how partitioning strategies work in distributed systems and how consistent hashing improves scalability.
• How Does Partitioning Work When Requests Are Being Routed? – Explore how different partitioning mechanisms impact load distribution and system efficiency.









Leave a reply to How to Relieve Hotspots with Skewed Workloads – Scalable Human Cancel reply