Leaderless replication is suitable for multi-data centre operations, due its ability to tolerate conflicts, such as:
- Concurrent writes
- Network interruptions
- Latency spikes
For instance both Cassandra and Voldemort implement there multi data centre support within the normal leaderless model.
- The number of replicas N includes nodes in all data centres
- And with these you can specify the configuration for each of these
- Each write from a client is sent to all replicas (regardless of data centres)
- But usually the client only waits for acknowledgment from a quorum of nodes within its local data centre
- Therefore, this is unaffected by delays and interruptions on the cross data centre link
- But usually the client only waits for acknowledgment from a quorum of nodes within its local data centre
- Now configuration…
- Asynchronous configuration is often used for for higher latency writes to data centres (although there is some flexibility in configuration)
- For example, Riak keeps all communication between clients and database nodes local to one data centre.
- So for this, N describes the number of replicas within one data centre
- Cross data replication between database clusters happens asynchronously in the background with a style that is similar to multi leader replication
Detecting concurrent writes
Dynamo styled databases allow for several clients to concurrently write to the same key.
- Which means conflicts can occur even if strict quorums are used
- The situation is similar to multi-leader replication
- Although with Dynamo styled databases conflicts can additionally occur during read repair or hinted hand off
- The problem is event can arrive in a different order at different nodes, this could be due to:
- Network delays
- Partial failures
- For example:
- We have two clients A and B
- Simultaneously writing to a key X
- In a three node datastore
- Node 1 receives a write from A
- But never receives from B due to a transient outage 😦
- Node 2 first receives the write from A then the write from B
- Node 3 first received the write from B and the write from A
- If each node simply overwrote the value for a key for whenever it receives a write request from a client…
- The nodes will become permanently inconsistent
- Node 2 thinks the final value of x is B 🤷♂️
- Where the other nodes think that the value is A
- We have two clients A and B
- In order to become eventually consistent the replicas should converge towards the same value
- How do they do that?
- One might hope that replicated databases handle this automatically… (unfortunately most implementation are quite poor)
- If you want to avoid data loss
- You the application developer need to know a lot of internals of your database conflict handling
- How do they do that?
In previous blog posts I have mentioned some techniques on conflict resolution.
Last write wins, discarding concurrent writes
One approach to achieving eventual convergence, is to declare that each replica needs only store the most recent value, and allow older values to be overwritten and discarded.
- Then as long as we have some way of unambiguously determining which write is more recent and every write is eventually copied to every replica
- The replicas will eventually converge to the same value
- ⚠️ This idea is quite misleading
- In the earlier example with the two clients simultaneously write to the 3 node data store
- Neither client knew about the other one when it sent its writes
- So it is not clear which one happened first
- In fact it really does not make sense to say either happened “first”
- We say the writes are concurrent so the order is undefined
- Even though the writes do not have a natural ordering
- We can force a arbitrary order on them
- For example we can attach a timestamp to each write
- Pick the biggest timestamp as the most recent and discard any writes with the earlier timestamps
- This conflict resolution algorithm is called “last write wins” LWW
- This is the only conflict resolution handler supported in Cassandra
- And an optional feature Riak
- We can force a arbitrary order on them
- Neither client knew about the other one when it sent its writes
Last Write Wins (LWW)
LWW achieves the goal of eventual convergence but at the cost of durability 👎
- If there are several concurrent writes to the same key
- Even if they were all reported successfully for the client
- Because they were written to the write replicas
- Only one of the writes will survive
- The other will be silently discarded
- Moreover, LWW may even drop writes that are not concurrent
- Because they were written to the write replicas
- There are some situations such as caching where lost writes can be acceptable
- If losing data is not acceptable LWW is a poor choice for conflict resolution
- The only safe way to use a database with LWW
- Is to ensure a key is only written once and thereafter treated as immutable
- Thus, avoiding any concurrent updates to the same key
- For example a recommended way of using Cassandra is to use a UUID as the key
- Thus giving each write operation a unique key