What is Fixed Partitioning?

There are few ways to assigning partitions to nodes, in this post we will discuss fixed partitioning.

How not to rebalance partitions!

When partitioning by the hash of a key… it is best to divide the possible hashes into ranges and assign each range to a partition…

Why don’t we just use modulus?

(The percent operator in many programming languages)

Hash key modulus 10 would return a number between 0 and 9….

If you write the hash as a decimal number the hash modulus 10 would be the last digit
If we have 10 nodes numbered 0 to 9

That seems like an easy way to assign a key to each node?

The problem with the modulus N approach is that if the number of nodes N changes and therefore the value by which we Modulus hashes
- Most of the keys would need to be moved to one node to another
  - Such frequent moves makes rebalancing excessively expensive

We need an approach that does not move data around more than necessary!

Fixed Number of Partitions!

Fortunately there is a fairly simple solution!

Create many more partitions that there are nodes. And assign several partitions to each node.

For example
- A database running on a cluster of 10 nodes
  - Maybe split into 1000 partitions from the outset
    - So approximately 100 partitions are assigned to each node
  - Now if a node is added to a cluster
    - The new node can steal a few clusters from every existing node
    - Until partitions are fairly distributed once again

If a node is removed from the cluster?

The same happens in reverse!

Only entire partitions are moved between nodes
The number of partitions does not change
Nor does the assignment of keys to a partitions
The only thing that changes is the assignment of partitions to nodes
This change of assignment is not immediateIt takes some time to transfer large amounts of data over the network
The old assignment of partitions is used for any reads and writes to happen whilst the transfer is in progress

Rebalancing hardware resources?

In principle you can even account for mis-matched hardware on your cluster, by assigning partitions to nodes that are more powerful.

You can force those nodes to take a greater share of the load.

Who uses fixed number of partitions for rebalancing?

This approach to rebalancing used in:

Typical configuration

In this configuration:

The number of partitions is usually fixed when the database is first set up and not changed afterward
Although in principle it is possible to split and merge partitions
A fixed number of partitions is operationally simpler, and so many fixed partitioned databases choose not to implement partition splitting
Thus, the number of partitions configured at the outset, is the maximum number of nodes you can have
So you need to choose it high enough to accommodate future growth

What to consider…

⚠️ However, each partition also has management over head
👉 So it is counter productive to choose too high a number
⚠️ Choosing the right number of partitions is difficult
- Especially if the total size of the dataset is highly variable!
  - For example – if its too small!
    - But may grow more larger overtime
    - Since each partition contains a fixed fraction of the total data
    - If partitions are very large…
      - Rebalancing from node failures become expensive
    - But if partitions are too small…
      - They occur too much overhead
    - Best performance?
      - This is achieved when the size of the partitions is just right
        
        Neither too big nor too small…
        
        Which can be hard to achieve! If the number of partitions is fixed but the data set size varies

Final note

In the next post we will discuss the alternative strategy to fixed partitioning and this will be dynamic partitioning, where we will delve into it advantages and caveats.

📚 Further Reading & Related Topics

If you’re exploring fixed partitioning in distributed systems, these related articles will provide deeper insights:

• Understanding Partitioning Proportional to Nodes – Learn how partitioning scales dynamically based on nodes and how it differs from fixed partitioning.

• How Does Partitioning Work When Requests Are Being Routed? – Explore how requests are routed in partitioned systems and how different strategies impact performance and scalability.

2 responses to “What is Fixed Partitioning?”

Understanding Partitioning Proportional to Nodes – Scalable Human

6th August 2022

[…] Since splitting and merging processes, this keeps the size of each partition between some fixed number of partitions […]

LikeLike

What is Partitioning and Why is it Important? – Scalable Human

4th September 2022

[…] partitioning by hash, it is common to make a fixed number of partitions in […]

LikeLike

Scalable Human Blog