-
Continue reading →: How to Relieve Hotspots with Skewed WorkloadsAs discussed in the previous post What is Consistent Hashing?, hashing a key to determine its partition can aid in reducing hotspots… However they cannot be avoided entirely… In some extreme cases where all reads and writes offer the same key This scenario of workload is potentially unusual, although this is…
-
Continue reading →: What is Consistent Hashing?As defined by the author of Consistent Hashing and Random Trees, Distributed hashing protocols for Relieving Hotspots on the World Wide Web, consistent hashing is a way of evenly distributing load across an internet wide system of caches. Example of this? 👉 Content delivery network (CDN) “A content delivery network…
-
Continue reading →: What is Partitioning by Hash of Key?As discussed in the previous blogs: These posts identify the risk of skew and hotspots. ⚠️ Many distributed data stores use a hash function to determine the partition for a given key. What does it take for a hash function? For partitioning purposes the hash functions need not to be…
-
Continue reading →: What is Partitioning By Key Range?One method of partitioning is to assign a continuous range of keys, for some minimum to some maximum to each partition. How are keys arranged? The arrangement of keys are not necessary evenly spaced… How to distribute the data evenly? The partition boundaries need to adapt to the data OR……
-
Continue reading →: What is Partitioning of Key-Value Data?Say you have a large amount of data, and you want to partition it… How do you decide which records to store on which nodes? 🤔 Unfair partitioning (Skew) 👎 If the partitioning is unfair… So that more partitions have more data than others.. Avoiding hot spots The simplest approach…
-
Continue reading →: What is Partitioning?In the previous blogs I have been discussing replication, where we have multiple copies of the same data on different nodes… Although, for very large datasets, or very high query throughput this is not sufficient! 👎 We require something that can divide the data up into partitions… aka sharding (previously…
-
Continue reading →: What is Replication and Why is it Important?Lets firstly breakdown the goals of replication The humbling reality…🙏 Despite being a simplistic objective to keep copies on several machines, replication is remarkably a tricky problem! It requires careful attention to: At a minimum, we will generally require dealing with the following: And that is not even considering the…
-
Continue reading →: What Are Consistent Prefix Reads?First of all the problem case… Replication lag… imagine two people talking between each other. 🗣🗣 🤔 There can be times where this may appear in a different different order from an observer’s point of view, or too fast to make much sense… Consistent Prefix Reads to the rescue? This…
-
Continue reading →: What are Monotonic Reads?First of all the problem case… The problem: anomalies can occur when reading asynchronous followers! Monatonic Reads to the rescue? Well this method is a weaker guarantee than strong consistency, but a stronger guarantee than eventual consistency. For more readings on strong consistency vs eventual consistency please read my previous…
-
Continue reading →: Eventual Consistency vs Strong ConsistencyEventual Consistency What is it? Theoretical guarantee If no new updates to an entity are made All reads of the entity will eventually return the last updated value Example of an eventual consistency model? Internet Domain Name System (DNS)! DNS servers are cached and replicated across directories over the internet…
-
Continue reading →: What is Replication Lag?Replication lag.. why bother with replication? Reasons why we use this: How can replication lag occur with a read scaling architecture? Let walk through a common replication pattern with leader based replication: When does this work? In the example above, it would require a higher perentage of reads and a…
-
Continue reading →: What is Trigger-Based Replication?The replication process typically is implemented by the database system. No application code required. ✅ (Many cases that is what we want) Although.. in some circumstances replication can been needed to perform: You may ay require moving replication to the application layer such as toolset: Alternative, features that are available…
-
Continue reading →: What is Logical Log Replication?Logical replication consists of a process of replicating data objects and their changes. Key characteristics as followed: “Logical replication sends row-by-row changes, physical replication sends disk block changes. Logical replication is better for some tasks, physical replication for others.” https://stackoverflow.com/questions/33621906/difference-between-stream-replication-and-logical-replication How does logical replication work? Logical replication uses a publish…
-
Continue reading →: What is a Write-ahead Logging (WAL)?Key-value store store is a fundamental component, which is gaining exponential demand in a multitude of horizontally scaling environments, including: Example of features of key value storage engine: Write-ahead Logging (WAL) is used in storage engines to provide transactions with: Storage engines and WAL Log statements are compacted appended and…
-
Continue reading →: Replication Logs – What is Statement-Based Replication?In this blog post I will be covering one approach to implementing replication logs… statement-based replication. How does leader based replication work under the hood? 🤔 Several different replication methods are used in practice (due to the multitude of edge cases!) Final note However, as there are so many edge…
-
Continue reading →: What are Version Vectors? – AlgorithmExpanding upon the previous blog post regarding Capturing the happens before relationship, the scenario that is described only uses a single replica… The question is, how does the algorithm change when there are multiple replicas but no leader? (leaderless replication) A collection of version numbers from all the replicas is…
-
Continue reading →: Understanding the “Happens-Before” Relationship in Distributed SystemsIn this post, we will delve into an algorithm that can tell whether two operations are concurrent or whether one happened before another. Scenario Let’s begin with a database with only one replica. (so we can simplify this and thereafter generalise the approach to a leaderless database with multiple replicas)…
-
Continue reading →: How to Define a Concurrent Operation?Let’s begin with understanding, on how do we decide whether two operations are concurrent or not? Consider this scenario: The two writes are not concurrent… On the other hand: How to define concurrency? Whether one operation happens before another operation is the key to defining what concurrency means… In fact…
-
Continue reading →: What to consider with Replication and Multi-Data Centre Operations?Leaderless replication is suitable for multi-data centre operations, due its ability to tolerate conflicts, such as: For instance both Cassandra and Voldemort implement there multi data centre support within the normal leaderless model. Detecting concurrent writes Dynamo styled databases allow for several clients to concurrently write to the same key.…








