Rebalancing Partitions – What is it?

Overtime things change in a database…

For example ↘️

Query throughput increases! 📈
- So you want to add more CPU to handle the load ⚡️
The dataset size increases!
- So you want to add more discs and RAM to store it ⚡️
Or the machine fails!
- Other machine needs to take over the failed machine responsibilities ⚡️

All of these changes call for data and requests to be moved from one node to another…

🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢

The process of moving load from one node in a cluster to another is called rebalancing.

Rebalancing is usually expected to meet some minimum requirements.

Data storage or read writes requests should be shared fairly between the nodes in a cluster.

The database should continue accepting reads and writes

No more data than necessary should be moved between nodes
- This makes rebalancing fast!
Minimise the network and disk IO load

📚 Further Reading & Related Topics

If you’re exploring rebalancing partitions in distributed systems, these related articles will provide deeper insights:

• Understanding Partitioning Proportional to Nodes – Learn how partition rebalancing works in distributed systems and how proportional partitioning ensures balanced load and data access.

• Distributed Data-Intensive Systems: Replication vs. Partitioning vs. Clustering vs. Sharding – Explore how partition rebalancing fits into the broader context of replication, clustering, and sharding to ensure efficient data distribution in distributed systems.

Scalable Human Blog