Let’s connect on Github →

Scalable Human Blog

Software Engineering

What is Partitioning By Key Range?

Published by

Sean Mayer

on

What is Partitioning By Key Range?

One method of partitioning is to assign a continuous range of keys, for some minimum to some maximum to each partition.

Like volumes inside an encyclopaedia 📚
- If you know the boundaries between the ranges…
  - You can easily determine which partition contains a given key
- If you also know which partition is assigned to each node
  - Then you can make your request to the appropriate node
  - Or in the case of the encyclopaedia Pick the correct book off the shelf 📚📖

How are keys arranged?

The arrangement of keys are not necessary evenly spaced…

Because you data may not be evenly distributed
For example
- In an encyclopaedia volume 1 may contain words starting with A and B
- But volume 12 contains words starting with T U V X Y
- Simply having 1 volume per 2 letter of the alphabet will eventually lead you to some volumes being much bigger than others 👎

How to distribute the data evenly?

The partition boundaries need to adapt to the data

The partition boundaries might be chosen manually be an administrator 🥸

OR…

The database can choose them automatically 🪄
- This partitioning strategy is used by
  - BigTable (open source equivalent to HBase)
  - RethinkDB
  - MongoDB before version 2.4
- With each partition we can keep the keys in a sorted order
  - This has the advantage that range scans are easy
  - And you can treat the key as an concatenated index
    - For example
      - In order to fetch several records in one query…
      - Consider an application that stored data from a network of sensors 📱
        
        Where the key is the timestamp of the measurement (year, month, hour, day, second)
        
        Range scans are useful in this case as they could let fetch all the reading for a particular month

Downside of key range partitioning

Certain access patterns can lead to hotspots.

If the key is a timestamp, then the keys correspond to ranges of time.
- For example 1 partition per day
Unfortunately, as we write data to the database as measurements happen…
- All the writes end up going to the same partition for the one for today!
So the partition can be overburdened with writes where others sit idle 👎

Avoiding hotspots ⭐️

To avoid this problem in a sensor database.

You need something other an the timestamps as the first element of the key
- For example
  - You can prefix each timestamp with the sensor name and then by time ✅
  - 👉 Assuming you have many sensors active at the same time
    - The write load will end up more evenly spread across the partitions
Now when you want to collect multiple sensors within a given time range…
- You need to perform a range query for each sensor name ✅

📚 Further Reading & Related Topics

If you’re exploring partitioning by key range, these related articles will provide deeper insights:

• Understanding Partitioning: Proportional to Nodes – Learn about different partitioning strategies and how key-range partitioning helps manage load distribution and data access in distributed systems.

• Distributed Data-Intensive Systems: Sharding, Clustering, and Replication – Explore how partitioning by key range interacts with other data distribution strategies like sharding, replication, and clustering to optimize data consistency, availability, and performance.

3 responses to “What is Partitioning By Key Range?”

What is Partitioning by Hash of Key? – Scalable Human

16th May 2022

[…] What is Partitioning By Key Range? […]

LikeLike

Reply
What is Dynamic Partitioning? – Scalable Human

24th July 2022

[…] database that use key range partitioning, a fixed number of partitions with fixed boundaries would be very […]

LikeLike

Reply
What is Partitioning and Why is it Important? – Scalable Human

4th September 2022

[…] Key-range partitioning […]

LikeLike

Reply

Leave a reply to What is Partitioning and Why is it Important? – Scalable Human Cancel reply

Hello,

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect

Join the mailing list

Stay updated with latest posts.

Recent posts