-
Continue reading →: Understanding Partitioning Proportional to NodesWith dynamic partitioning the number of partitions is proportional to the size of the dataset. Since splitting and merging processes, this keeps the size of each partition between some fixed number of partitions But there is another option A third option is used by Cassandra… What happens when you increase…
-
Continue reading →: What is Dynamic Partitioning?For database that use key range partitioning, a fixed number of partitions with fixed boundaries would be very inconvenient! For that reason… Key range partition databases such as HBase and RethinkDB create partitions dynamically! 👍 Advantage of Dynamic Partitioning Is that the number of partitions adapts to the total data…
-
Continue reading →: What is Fixed Partitioning?There are few ways to assigning partitions to nodes, in this post we will discuss fixed partitioning. How not to rebalance partitions! When partitioning by the hash of a key… it is best to divide the possible hashes into ranges and assign each range to a partition… Why don’t we…
-
Continue reading →: Rebalancing Partitions – What is it?Overtime things change in a database… For example ↘️ All of these changes call for data and requests to be moved from one node to another… 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 ←→ 🟢 The process of moving load from one node in a cluster to another…
-
Continue reading →: Partitioning Secondary Indexes by Term – What is it?Rather than each partition having its own secondary index (local index). We can construct a global index that covers data in all partitions. Global Index and term partitioning? A global index must also be partitioned. But it can be partitioned differently from the primary key index. Here is how this…
-
Continue reading →: How to Partition with Secondary Indexes by DocumentConceptualise a situation where you are operating a website for used cars. 🚗 🏎 🚕 🚓 🚘 🚖 🚔 🚙 The need for the secondary index… You want to let users search for cars, allowing them to filter by colour and by make. Secondary index relationships with partitions In the…
-
Continue reading →: What to Consider with Secondary Indexes and PartitioningThe partitioning schemes that have been covered in the previous blogs, rely on a key value data model. If records are only ever accessed via a primary key… What are secondary indexes? The situation becomes more complicated when secondary indexes are involved… 🤯 “Secondary indexes are the bread and butter…
-
Continue reading →: How to Relieve Hotspots with Skewed WorkloadsAs discussed in the previous post What is Consistent Hashing?, hashing a key to determine its partition can aid in reducing hotspots… However they cannot be avoided entirely… In some extreme cases where all reads and writes offer the same key This scenario of workload is potentially unusual, although this is…
-
Continue reading →: What is Consistent Hashing?As defined by the author of Consistent Hashing and Random Trees, Distributed hashing protocols for Relieving Hotspots on the World Wide Web, consistent hashing is a way of evenly distributing load across an internet wide system of caches. Example of this? 👉 Content delivery network (CDN) “A content delivery network…
-
Continue reading →: What is Partitioning by Hash of Key?As discussed in the previous blogs: These posts identify the risk of skew and hotspots. ⚠️ Many distributed data stores use a hash function to determine the partition for a given key. What does it take for a hash function? For partitioning purposes the hash functions need not to be…
-
Continue reading →: What is Partitioning By Key Range?One method of partitioning is to assign a continuous range of keys, for some minimum to some maximum to each partition. How are keys arranged? The arrangement of keys are not necessary evenly spaced… How to distribute the data evenly? The partition boundaries need to adapt to the data OR……
-
Continue reading →: What is Partitioning of Key-Value Data?Say you have a large amount of data, and you want to partition it… How do you decide which records to store on which nodes? 🤔 Unfair partitioning (Skew) 👎 If the partitioning is unfair… So that more partitions have more data than others.. Avoiding hot spots The simplest approach…
-
Continue reading →: What is Partitioning?In the previous blogs I have been discussing replication, where we have multiple copies of the same data on different nodes… Although, for very large datasets, or very high query throughput this is not sufficient! 👎 We require something that can divide the data up into partitions… aka sharding (previously…
-
Continue reading →: What is Replication and Why is it Important?Lets firstly breakdown the goals of replication The humbling reality…🙏 Despite being a simplistic objective to keep copies on several machines, replication is remarkably a tricky problem! It requires careful attention to: At a minimum, we will generally require dealing with the following: And that is not even considering the…
-
Continue reading →: What Are Consistent Prefix Reads?First of all the problem case… Replication lag… imagine two people talking between each other. 🗣🗣 🤔 There can be times where this may appear in a different different order from an observer’s point of view, or too fast to make much sense… Consistent Prefix Reads to the rescue? This…
-
Continue reading →: What are Monotonic Reads?First of all the problem case… The problem: anomalies can occur when reading asynchronous followers! Monatonic Reads to the rescue? Well this method is a weaker guarantee than strong consistency, but a stronger guarantee than eventual consistency. For more readings on strong consistency vs eventual consistency please read my previous…
-
Continue reading →: Eventual Consistency vs Strong ConsistencyEventual Consistency What is it? Theoretical guarantee If no new updates to an entity are made All reads of the entity will eventually return the last updated value Example of an eventual consistency model? Internet Domain Name System (DNS)! DNS servers are cached and replicated across directories over the internet…
-
Continue reading →: What is Replication Lag?Replication lag.. why bother with replication? Reasons why we use this: How can replication lag occur with a read scaling architecture? Let walk through a common replication pattern with leader based replication: When does this work? In the example above, it would require a higher perentage of reads and a…
-
Continue reading →: What is Trigger-Based Replication?The replication process typically is implemented by the database system. No application code required. ✅ (Many cases that is what we want) Although.. in some circumstances replication can been needed to perform: You may ay require moving replication to the application layer such as toolset: Alternative, features that are available…








