What is Replication and Why is it Important?

Lets firstly breakdown the goals of replication

High availability
- Maintaining that a system is running at an agreed level of operation.
- Even when one machine or several machines or an entire data centre goes down!
Disconnected Operations – Clients with offline operations
- Enabling the application to continue working when there is network latency
Latency
- Placing data geographically close to users so that the users can interact with it faster

Scalability
- The ability to handle a higher volume of reads, that a single machine can handle by performing these reads on replicas

The humbling reality…🙏

Despite being a simplistic objective to keep copies on several machines, replication is remarkably a tricky problem! It requires careful attention to:

📝Concurrency 👈
📝Things that can go wrong 👈
📝Dealing with the consequences of the faults 👈

At a minimum, we will generally require dealing with the following:

📝Unavailable nodes 🔴❌
📝Network interruptions 📶 ❌

And that is not even considering the more insidious faults 🧟‍♂️, such as:

📝Silent data corruption due to software bugs 🪲

What approaches we can take with replication?

Single leader replication
- Clients send all writes to a single node (leader)
  - Streams of data change events are sent between followers
- Reads can be performed by any replica
  - But followers may return stale reads
Multi-leader replication
- Where clients send each write to one of several leader nodes
  - Any of which can accept writes
- Streams of data change events are sent between leaders and to any follower nodes
- Related, on choosing the best multi-leader topology: The Multi-Leader Replication Topologies
Leaderless replication
- Clients send each write to several nodes
- There is the ability to read from several nodes in parallel ⬇️
  - In order to correct and detect nodes with stale data

Advantages and disadvantage of replication

Single leader replication is the most popular because:

Easier to understand ✅ ☺️
No conflict resolution to worry about ✅

Multi-leader replication and Leaderless replication can be more robust in handling:

Faulty nodes ✅
Network interruptions ✅
Latency spikes ✅

At the cost of being:

Harder to reason about ❌
Providing only very weak consistency guarantees to end users ❌

Asynchronous and synchronous replicaton

This can have a profound affect on the system behaviour when there is a fault.

Asynchronous replication:

Can be faster when the system is running as expected ✅
It is important to figure out what happen when replication lag increases or servers failures 👈📝
If a leader fails and you promote an asynchronously updated follower to be the new leader 🤔
- There is a risk… ⚠️
  - the recently committed data maybe lost ❌

Replication Lag

Consistency models can be utilised to combat replication lag by giving the replicas a set of instructions on how to behave when this occurs.

The consistency models:

Read after write consistency
- Users should always see data that they have submitted themselves
Monotonic reads
- After the users have seen data at one point in time
  - They should not see data from an earlier point in time
Consistent prefix reads
- Users should see the data in a state that makes casual sense
  - For example, seeing a question and it’s reply in the correct order

Concurrency issues:

These are inherent in multi leader and leaderless replication… 🤷‍♂️

Because they allow multiple writes to happen concurrently therefore conflicts may occur ❌
🧐 There are numerous algorithms that allow databases determine whether:
- One operation happened before another ✅
- Whether they happened concurrently ✅
  - Capturing the happens before relationship
  - How to Define a Concuurrent Operation?
There are also algorithms to resolving conflicts ➡️💥⬅️
- This can be done by merging together concurrent updates or subtle techniques ✅
  - Recommended posts:

Final author recommendations

This concludes the end of the blog series on replication in data intensive systems. To note, most of this information can be found in “Designing Data Intensive Applications” by Martin Kleppmann, which is in my opinion a highly recommend and extensive book on this subject area.

3 thoughts on “What is Replication and Why is it Important?”

What is Partitioning? – Scalable Human says:

9th April 2022 at 4:10 pm

[…] the previous blogs I have been discussing replication, where we have multiple copies of the same data on different nodes… Although, for very large […]

LikeLike

Do we take Transactions for granted? – Scalable Human Blog says:

6th November 2022 at 7:56 pm

[…] In this blog post I will summarise transactions and the different vectors we should consider, I plan to make some further posts on the different areas on this, similar to my series on partitioning and replication. […]

LikeLike

Distributed databases – Is Performance, Scalability and Transactional Guarantees Achievable? – Scalable Human Blog says:

20th November 2022 at 6:56 pm

[…] database status quo. This was achieved by offering new choices of data models, and by including replication and partitioning by […]

LikeLike

Scalable Human Blog

Software Engineering

What is Replication and Why is it Important?

Lets firstly breakdown the goals of replication

The humbling reality…🙏

What approaches we can take with replication?

Advantages and disadvantage of replication

Asynchronous and synchronous replicaton

Final author recommendations

3 thoughts on “What is Replication and Why is it Important?”

Leave a comment Cancel reply

Lets firstly breakdown the goals of replication

The humbling reality…🙏

What approaches we can take with replication?

Advantages and disadvantage of replication

Asynchronous and synchronous replicaton

Final author recommendations

Share this:

Related

3 thoughts on “What is Replication and Why is it Important?”

Leave a comment Cancel reply