One method of partitioning is to assign a continuous range of keys, for some minimum to some maximum to each partition.
- Like volumes inside an encyclopaedia 📚
- If you know the boundaries between the ranges…
- You can easily determine which partition contains a given key
- If you also know which partition is assigned to each node
- Then you can make your request to the appropriate node
- Or in the case of the encyclopaedia Pick the correct book off the shelf 📚📖
- If you know the boundaries between the ranges…
How are keys arranged?
The arrangement of keys are not necessary evenly spaced…
- Because you data may not be evenly distributed
- For example
- In an encyclopaedia volume 1 may contain words starting with A and B
- But volume 12 contains words starting with T U V X Y
- Simply having 1 volume per 2 letter of the alphabet will eventually lead you to some volumes being much bigger than others 👎
How to distribute the data evenly?
The partition boundaries need to adapt to the data
- The partition boundaries might be chosen manually be an administrator 🥸
OR…
- The database can choose them automatically 🪄
- This partitioning strategy is used by
- With each partition we can keep the keys in a sorted order
- This has the advantage that range scans are easy
- And you can treat the key as an concatenated index
- For example
- In order to fetch several records in one query…
- Consider an application that stored data from a network of sensors 📱
- Where the key is the timestamp of the measurement (year, month, hour, day, second)
- Range scans are useful in this case as they could let fetch all the reading for a particular month
- For example
Downside of key range partitioning
Certain access patterns can lead to hotspots.
- If the key is a timestamp, then the keys correspond to ranges of time.
- For example 1 partition per day
- Unfortunately, as we write data to the database as measurements happen…
- All the writes end up going to the same partition for the one for today!
- So the partition can be overburdened with writes where others sit idle 👎
Avoiding hotspots ⭐️
To avoid this problem in a sensor database.
- You need something other an the timestamps as the first element of the key
- For example
- You can prefix each timestamp with the sensor name and then by time ✅
- 👉 Assuming you have many sensors active at the same time
- The write load will end up more evenly spread across the partitions
- For example
- Now when you want to collect multiple sensors within a given time range…
- You need to perform a range query for each sensor name ✅
📚 Further Reading & Related Topics
If you’re exploring partitioning by key range, these related articles will provide deeper insights:
• Understanding Partitioning: Proportional to Nodes – Learn about different partitioning strategies and how key-range partitioning helps manage load distribution and data access in distributed systems.
• Distributed Data-Intensive Systems: Sharding, Clustering, and Replication – Explore how partitioning by key range interacts with other data distribution strategies like sharding, replication, and clustering to optimize data consistency, availability, and performance.









Leave a reply to What is Partitioning and Why is it Important? – Scalable Human Cancel reply