Overtime things change in a database…
For example ↘️
- Query throughput increases! 📈
- So you want to add more CPU to handle the load ⚡️
- The dataset size increases!
- So you want to add more discs and RAM to store it ⚡️
- Or the machine fails!
- Other machine needs to take over the failed machine responsibilities ⚡️
All of these changes call for data and requests to be moved from one node to another…
🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢
The process of moving load from one node in a cluster to another is called rebalancing.
No matter which partitioning scheme is used…
Rebalancing is usually expected to meet some minimum requirements.
After rebalancing the load…
Data storage or read writes requests should be shared fairly between the nodes in a cluster.
While rebalancing is happening…
The database should continue accepting reads and writes
- No more data than necessary should be moved between nodes
- This makes rebalancing fast!
- Minimise the network and disk IO load
📚 Further Reading & Related Topics
If you’re exploring rebalancing partitions in distributed systems, these related articles will provide deeper insights:
• Understanding Partitioning Proportional to Nodes – Learn how partition rebalancing works in distributed systems and how proportional partitioning ensures balanced load and data access.
• Distributed Data-Intensive Systems: Replication vs. Partitioning vs. Clustering vs. Sharding – Explore how partition rebalancing fits into the broader context of replication, clustering, and sharding to ensure efficient data distribution in distributed systems.









Leave a comment