What is Dynamic Partitioning?

For database that use key range partitioning, a fixed number of partitions with fixed boundaries would be very inconvenient!

If you got the boundaries wrong ❌
- You could end up with all the data in one partition and all the other partitions empty 👎
- Reconfiguring the partition boundaries manually would be very tedious👎

For that reason… Key range partition databases such as HBase and RethinkDB create partitions dynamically! 👍

When a partition grows to exceed a configured size (on HBase the default is 10GB)
- It is split into two partitions
- So approximately half the data ends up on each side of the split
- Conversely, if lots of data is deleted and partition shrinks below some threshold it can be merged with the adjacent partition
This process is similar to what happens at the tope level of a B-Tree
- Each partition is assigned to one node
- And each node can handle multiple partitions
- Like in a case of fixed number of partitions
- After a large partition has been split, one of its two halves can be transferred to another node in order to balance the load
In case of HBase
- The transfer of partition files happens with Hadoop Distributed File System (HDFS)

Advantage of Dynamic Partitioning

Is that the number of partitions adapts to the total data volume ✅

If there is only a small amount of data, a small number of partitions is sufficient, so overheads are small.
If there is a huge amount of data, the size of each individual partition is limited to a configurable maximum.

However a caveat is…

That an empty database starts of with a single partition
Since there is no information where to draw the boundaries, while the dataset is small
Until it hits the point of which it hits the first partition is split
All writes have to be processed by a single node, while other nodes sit idle…

👉 To mitigate this issue, HBase and MongoDB allow an initial set of partitions to be configured on an empty database. This is called pre-splitting.

Pre-Splitting

In case of key range partitioning pre-splitting requires that you already know what the key distribution already looks like.

Can equally be used with hash partitioned data. MongoDB (since v2.4) supports both key range and hash partitioning and it split partitions dynamically for either case.

📚 Further Reading & Related Topics

If you’re exploring dynamic partitioning in distributed systems and databases, these related articles will provide deeper insights:

• How Does Partitioning Work When Requests Are Being Routed? – Learn how partitioning strategies affect request distribution, scalability, and performance in distributed environments.

• How to Relieve Hotspots with Skewed Workloads – Discover techniques for balancing uneven data loads, a key challenge that dynamic partitioning helps address.