What is Partitioning of Key-Value Data?

Software Engineering

What is Partitioning of Key-Value Data?

Published by

Sean Mayer

30th April 2022

Say you have a large amount of data, and you want to partition it…

How do you decide which records to store on which nodes? 🤔

Our goals with partitioning is to spread data and query load evenly across nodes
- If every node takes an equal burden…
  - In theory 10 nodes should be able to 10x as much data the load (read write throughput) is currently handling on one node (ignoring replication for now) ✅

Unfair partitioning (Skew) 👎

If the partitioning is unfair… So that more partitions have more data than others..

We call it skew
- The presence of skew makes partitioning much less effective 👎
- In an extreme case all the load could end up on 1 partition! ❌
  - So 9 out of 10 nodes are idle
  - Single busy node is your bottleneck
  - Partition with disproportionately high load is called a hotspot

Avoiding hot spots

The simplest approach to avoiding hotspots would be to:

Assign records to nodes randomly that distributes the data fairly evenly across nodes ✅

Big downside:

When you are trying to read a particular item
- You have no way of knowing which particular node it is on
- So you have to query all the nodes in parallel!! 🤯

We can do better….

Let’s assume we have a key value pair data model, in which you always access a record via primary key.

In the next blog post I will be covering partitioning by key range where you can incorporate this.

📚 Further Reading & Related Topics

If you’re exploring partitioning strategies for key-value stores in distributed systems, these related articles will provide deeper insights:

• How Does Partitioning Work When Requests Are Being Routed? – Learn how partitioning affects request distribution and system scalability in distributed environments.

• Distributed Data-Intensive Systems: Logical Log Replication – Understand how replication strategies ensure data consistency across partitions in key-value storage architectures.

2 responses to “What is Partitioning of Key-Value Data?”

What is Partitioning by Hash of Key? – Scalable Human

16th May 2022

[…] What is Partitioning of Key-Value Data? […]

LikeLike

Reply
Parallel Query Execution – What is it? (in 1 minute) – Scalable Human

29th August 2022

[…] Martin Kleppman from Designing Data Intensive Applications outlines, there are simple queries that read and write a single key. Plus there are scatter gather queries, in the case of document partition secondary […]

LikeLike

Reply

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect

Join the mailing list

Stay updated with latest posts.

Scalable Human Blog