Replication lag.. why bother with replication?
Reasons why we use this:
- Fault tolerance, from node failures
- Scalability of nodes based on requests
- Latency, placing nodes geographically closer to users
How can replication lag occur with a read scaling architecture?
Let walk through a common replication pattern with leader based replication:
- Requirements
- Writes go through a single node 🔵
- Read only queries go to any replica 🟢 🟢 🟢
When does this work?
In the example above, it would require a higher perentage of reads and a lesser amount of writes for this pattern to be attractive, as changing replica conditions due to writes are minimal, meaning delivery of read transactions are more consistent. ✅
Asynchronous vs Synchronous replication:
- Asynchronous replication only really works when wanting to add more follower nodes ✅
- Synchronous replication with a node outage can cause an entire system to lag or create down time ❌
Followers can fall behind although temporary state as they will catch up eventually… This is called eventual consistency.
Eventual consistency
Eventually is deliberately vague as there is no limit to how far a node can fall behind:
- Maybe a fraction of second (unnoticeable) 🤷♂️
- If there is lag in the entire system this can easily become several seconds to several minutes 🕙
- When lags are so large, it is not just theoretically an issue, this can cause real problems for applications ❌
Problem of replication lag?
Reading your own writes…
- Many applications let you submit data and let other users view it
- This maybe a record in a customer database, comment in a forum etc
- Asynchronous replication will mean some nodes maybe not up to date 🤔
- So if a user is submitting changes to a leader
- They may not see this on the follower they viewing from… this can cause distress to a user ❌
- 👉 Especially if it involves depositing your own money to another local account and not seeing the changes immediately being transferred 💸
- So if a user is submitting changes to a leader
How to handle replication lag?
When working with an eventually consistent system, it is worth considering the application behaviour if there are replication lags of several minutes or hours:
- If the answer is no problem… then great!
- But if the result is a poor experience for users 👎
- It is important to provide a stronger guarantee 💪
- Like read after write (in the next blog post I will talk about this deeper)
- It is important to provide a stronger guarantee 💪
❌ Pretending 🎭 an application is synchronous, when it is asynchronous is a concoction of problems later down the line. 🪲🦟
Potential solution
There are ways to provide a stronger guarantee that the underlying database…
- By performing certain types of reads on a leader 🤔
- However this is complex to do do on the application layer 👎
It would be better for some developer not needing to worry about replication issues, instead they can just just the database is doing the right thing. 🤷♂️
- This is why transactions exist
- They are a way for a database to provide strong guarantees so that the application can be simpler ✅
- Single node transactions have existed for a long time
- However, the move to distributed, replicated and partitioned databases many system have abandoned them 🤷♂️
- Claiming that transactions are too expensive ❌
- 👉 And asserting eventual consistency is inevitable in a scalable system
- There are some truths on that statement, but this is overly simplistic, there are many more nuances out there
- However, the move to distributed, replicated and partitioned databases many system have abandoned them 🤷♂️
📚 Further Reading & Related Topics
If you’re exploring replication lag in distributed systems, these related articles will provide deeper insights:
• Distributed Data-Intensive Systems: Logical Log Replication – Learn how logical logs help manage replication consistency and reduce lag in distributed environments.
• Distributed Data-Intensive Systems: Reading and Writing Quorums – Understand how quorum-based approaches affect replication performance and data consistency.









Leave a reply to Eventual Consistency vs Strong Consistency – Scalable Human Cancel reply