What is Replication and Why is it Important?

Lets firstly breakdown the goals of replication

  • High availability
    • Maintaining that a system is running at an agreed level of operation.
    • Even when one machine or several machines or an entire data centre goes down!
  • Disconnected OperationsClients with offline operations
    • Enabling the application to continue working when there is network latency
  • Latency
    • Placing data geographically close to users so that the users can interact with it faster
  • Scalability
    • The ability to handle a higher volume of reads, that a single machine can handle by performing these reads on replicas

The humbling reality…🙏

Despite being a simplistic objective to keep copies on several machines, replication is remarkably a tricky problem! It requires careful attention to:

  • 📝Concurrency 👈
  • 📝Things that can go wrong 👈
  • 📝Dealing with the consequences of the faults 👈

At a minimum, we will generally require dealing with the following:

  • 📝Unavailable nodes 🔴❌
  • 📝Network interruptions 📶 ❌

And that is not even considering the more insidious faults 🧟‍♂️, such as:

  • 📝Silent data corruption due to software bugs 🪲

What approaches we can take with replication?

  • Single leader replication
    • Clients send all writes to a single node (leader)
      • Streams of data change events are sent between followers
    • Reads can be performed by any replica
      • But followers may return stale reads
  • Multi-leader replication
    • Where clients send each write to one of several leader nodes
      • Any of which can accept writes
    • Streams of data change events are sent between leaders and to any follower nodes
    • Related, on choosing the best multi-leader topology: The Multi-Leader Replication Topologies
  • Leaderless replication
    • Clients send each write to several nodes
    • There is the ability to read from several nodes in parallel ⬇️
      • In order to correct and detect nodes with stale data

Advantages and disadvantage of replication

Single leader replication is the most popular because:

  • Easier to understand ✅ ☺️
  • No conflict resolution to worry about ✅

Multi-leader replication and Leaderless replication can be more robust in handling:

  • Faulty nodes ✅
  • Network interruptions ✅
  • Latency spikes ✅

At the cost of being:

  • Harder to reason about ❌
  • Providing only very weak consistency guarantees to end users ❌

Asynchronous and synchronous replicaton

This can have a profound affect on the system behaviour when there is a fault.

Asynchronous replication:

  • Can be faster when the system is running as expected ✅
  • It is important to figure out what happen when replication lag increases or servers failures 👈📝
  • If a leader fails and you promote an asynchronously updated follower to be the new leader 🤔
    • There is a risk… ⚠️
      • the recently committed data maybe lost ❌

Replication Lag

Consistency models can be utilised to combat replication lag by giving the replicas a set of instructions on how to behave when this occurs.

The consistency models:

  • Read after write consistency
    • Users should always see data that they have submitted themselves
  • Monotonic reads
    • After the users have seen data at one point in time
      • They should not see data from an earlier point in time
  • Consistent prefix reads
    • Users should see the data in a state that makes casual sense
      • For example, seeing a question and it’s reply in the correct order

Concurrency issues:

These are inherent in multi leader and leaderless replication… 🤷‍♂️

Final author recommendations

This concludes the end of the blog series on replication in data intensive systems. To note, most of this information can be found in “Designing Data Intensive Applications” by Martin Kleppmann, which is in my opinion a highly recommend and extensive book on this subject area.

3 thoughts on “What is Replication and Why is it Important?

Leave a comment