There are few ways to assigning partitions to nodes, in this post we will discuss fixed partitioning.
How not to rebalance partitions!
When partitioning by the hash of a key… it is best to divide the possible hashes into ranges and assign each range to a partition…
Why don’t we just use modulus?
(The percent operator in many programming languages)
Hash key modulus 10 would return a number between 0 and 9….
- If you write the hash as a decimal number the hash modulus 10 would be the last digit
- If we have 10 nodes numbered 0 to 9
That seems like an easy way to assign a key to each node?
- The problem with the modulus N approach is that if the number of nodes N changes and therefore the value by which we Modulus hashes
- Most of the keys would need to be moved to one node to another
- Such frequent moves makes rebalancing excessively expensive
- Most of the keys would need to be moved to one node to another
We need an approach that does not move data around more than necessary!
Fixed Number of Partitions!
Fortunately there is a fairly simple solution!
Create many more partitions that there are nodes. And assign several partitions to each node.
- For example
- A database running on a cluster of 10 nodes
- Maybe split into 1000 partitions from the outset
- So approximately 100 partitions are assigned to each node
- Now if a node is added to a cluster
- The new node can steal a few clusters from every existing node
- Until partitions are fairly distributed once again
- Maybe split into 1000 partitions from the outset
- A database running on a cluster of 10 nodes
If a node is removed from the cluster?
The same happens in reverse!
- Only entire partitions are moved between nodes
- The number of partitions does not change
- Nor does the assignment of keys to a partitions
- The only thing that changes is the assignment of partitions to nodes
- This change of assignment is not immediateIt takes some time to transfer large amounts of data over the network
- The old assignment of partitions is used for any reads and writes to happen whilst the transfer is in progress
Rebalancing hardware resources?
In principle you can even account for mis-matched hardware on your cluster, by assigning partitions to nodes that are more powerful.
- You can force those nodes to take a greater share of the load.
Who uses fixed number of partitions for rebalancing?
This approach to rebalancing used in:
Typical configuration
In this configuration:
- The number of partitions is usually fixed when the database is first set up and not changed afterward
- Although in principle it is possible to split and merge partitions
- A fixed number of partitions is operationally simpler, and so many fixed partitioned databases choose not to implement partition splitting
- Thus, the number of partitions configured at the outset, is the maximum number of nodes you can have
- So you need to choose it high enough to accommodate future growth
What to consider…
- ⚠️ However, each partition also has management over head
- 👉 So it is counter productive to choose too high a number
- ⚠️ Choosing the right number of partitions is difficult
- Especially if the total size of the dataset is highly variable!
- For example – if its too small!
- But may grow more larger overtime
- Since each partition contains a fixed fraction of the total data
- If partitions are very large…
- Rebalancing from node failures become expensive
- But if partitions are too small…
- They occur too much overhead
- Best performance?
- This is achieved when the size of the partitions is just right
- Neither too big nor too small…
- Which can be hard to achieve! If the number of partitions is fixed but the data set size varies
- This is achieved when the size of the partitions is just right
- For example – if its too small!
- Especially if the total size of the dataset is highly variable!
Final note
In the next post we will discuss the alternative strategy to fixed partitioning and this will be dynamic partitioning, where we will delve into it advantages and caveats.
📚 Further Reading & Related Topics
If you’re exploring fixed partitioning in distributed systems, these related articles will provide deeper insights:
• Understanding Partitioning Proportional to Nodes – Learn how partitioning scales dynamically based on nodes and how it differs from fixed partitioning.
• How Does Partitioning Work When Requests Are Being Routed? – Explore how requests are routed in partitioned systems and how different strategies impact performance and scalability.









Leave a comment