What is Partitioning By Key Range?

One method of partitioning is to assign a continuous range of keys, for some minimum to some maximum to each partition.

  • Like volumes inside an encyclopaedia 📚
    • If you know the boundaries between the ranges…
      • You can easily determine which partition contains a given key
    • If you also know which partition is assigned to each node
      • Then you can make your request to the appropriate node
      • Or in the case of the encyclopaedia Pick the correct book off the shelf 📚📖

How are keys arranged?

The arrangement of keys are not necessary evenly spaced…

  • Because you data may not be evenly distributed
  • For example
    • In an encyclopaedia volume 1 may contain words starting with A and B
    • But volume 12 contains words starting with T U V X Y
    • Simply having 1 volume per 2 letter of the alphabet will eventually lead you to some volumes being much bigger than others 👎

How to distribute the data evenly?

The partition boundaries need to adapt to the data

  • The partition boundaries might be chosen manually be an administrator 🥸

OR…

  • The database can choose them automatically 🪄
    • This partitioning strategy is used by
    • With each partition we can keep the keys in a sorted order
      • This has the advantage that range scans are easy
      • And you can treat the key as an concatenated index
        • For example
          • In order to fetch several records in one query…
          • Consider an application that stored data from a network of sensors 📱
            • Where the key is the timestamp of the measurement (year, month, hour, day, second)
            • Range scans are useful in this case as they could let fetch all the reading for a particular month

Downside of key range partitioning

Certain access patterns can lead to hotspots.

  • If the key is a timestamp, then the keys correspond to ranges of time.
    • For example 1 partition per day
  • Unfortunately, as we write data to the database as measurements happen…
    • All the writes end up going to the same partition for the one for today!
  • So the partition can be overburdened with writes where others sit idle 👎

Avoiding hotspots ⭐️

To avoid this problem in a sensor database.

  • You need something other an the timestamps as the first element of the key
    • For example
      • You can prefix each timestamp with the sensor name and then by time
      • 👉 Assuming you have many sensors active at the same time
        • The write load will end up more evenly spread across the partitions
  • Now when you want to collect multiple sensors within a given time range…
    • You need to perform a range query for each sensor name ✅

📚 Further Reading & Related Topics

If you’re exploring partitioning by key range, these related articles will provide deeper insights:

• Understanding Partitioning: Proportional to Nodes – Learn about different partitioning strategies and how key-range partitioning helps manage load distribution and data access in distributed systems.

• Distributed Data-Intensive Systems: Sharding, Clustering, and Replication – Explore how partitioning by key range interacts with other data distribution strategies like sharding, replication, and clustering to optimize data consistency, availability, and performance.

3 responses to “What is Partitioning By Key Range?”

  1. What is Partitioning by Hash of Key? – Scalable Human Avatar

    […] What is Partitioning By Key Range? […]

    Like

  2. What is Dynamic Partitioning? – Scalable Human Avatar

    […] database that use key range partitioning, a fixed number of partitions with fixed boundaries would be very […]

    Like

Leave a comment

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect