What is Partitioning?

In the previous blogs I have been discussing replication, where we have multiple copies of the same data on different nodes… Although, for very large datasets, or very high query throughput this is not sufficient! 👎

We require something that can divide the data up into partitions… aka sharding (previously highlighted in previous blog post)

Yup there are many creative names 🤣

As partitions are the most established term we will use that.

How does partitioning work?

“Clearly, we must break away from the sequential and not limit the computers. We must state definitions and provide for priorities and descriptions of data. We must state relationships, not procedures.”

Grace Brewster Murray Hopper

Normally, partitions are defined in such a way that each piece of data, each record or document belong to exactly one partition… There are various ways to achieve this, this will be discussed further on through these blog posts.

Simply putting it, each partition is a small database of its own.

  • Although the database may support operations that touch multiple partitions at the same time ⚙️
  • Different partitions can be placed on different nodes ⚙️
    • Which is the name for each machine or virtual machine running the database software in a shared nothing cluster
  • For queries that operate on a single partition each node can independently execute the queries for its own partition
    • So query throughput can be scaled by adding more nodes
  • Large complex queries can potentially be parallelised across many nodes
    • Although this gets significantly hard 🤯

Key benefit of partitioning

The main reason for wanting partitions is scalability.

  • Large datasets can be distributed across many discs ✅
  • Query load can be distributed across many processors ✅

History of partitioning

Partitioned databases were pioneered in the 1980s…

By products such as:

Final note

Some systems are designed for transactional workloads and others for analytics:

  • This difference affects how the system is tuned
  • But the fundamentals of partitioning apply to both kinds of workloads

In the blog series coming up we will first look at the different approached for partitioning..

  • We will talk about rebalancing.. ⚖️
    • Which is necessary if you wan tot add remove nodes from your cluster 🤔
  • Finally we get an overview of how databases route requests to the right partitions and execute queries 💫.

Leave a comment