With dynamic partitioning the number of partitions is proportional to the size of the dataset.
Since splitting and merging processes, this keeps the size of each partition between some fixed number of partitions
- The size of each partition is proportional to the size of the dataset ✅
- For both of these cases the number of partitions is independent to the number of nodes.
But there is another option
A third option is used by Cassandra…
- Is to make the number of partitions proportional to the number of nodes
- In this case the size of each partition grows proportionally to the dataset size, whilst the number of nodes remain the unchanged
What happens when you increase the number of nodes?
- 👉 When you increase the number of nodes, the partitions become smaller again!
- Since a larger data volume generally requires a larger number of nodes to store it on ✅
- This approach also keeps the size fairly stable ✅
What happens when a new node joins the cluster?
- When a new node joins the cluster…
- It randomly chooses a fixed number of existing partitions to split
- And then it takes ownership of one half of the split partitions
- While leaving the other half of each partition in place
Randomly chooses?!
The randomisation can produce an unfair split 👎
But when this is averaged over larger number of partitions (in Cassandra 256 partitions per nodes by default), the new nodes end up taking a fair share of the load of the existing nodes ✅
Cassandra 3.0 used an alternative rebalancing algorithm that avoided the unfair splits…
Picking the partition boundaries randomly
- Requires the use of hash based partitioning
- So the boundaries can be picked by the range of number produced by the hash function
Indeed this approach corresponds most closely with the original definition of consistent hashing (mentioned earlier in the blog). New hash functions can achieve a similar affect with lower metadata overhead.
📚 Further Reading & Related Topics
If you’re exploring partitioning strategies and scalability in distributed systems, these related articles will provide deeper insights:
• What is Dynamic Partitioning? – Learn how dynamic partitioning adapts to changing workloads and improves performance across distributed nodes.
• How Does Partitioning Work When Requests Are Being Routed? – Explore how requests are distributed across partitions and the role of node-aware partitioning in efficient system scaling.









Leave a reply to What is Partitioning by Hash of Key? – Scalable Human Blog Cancel reply