What is the Capturing the happens before relationship? – Algorithm

In this post, we will delve into an algorithm that can tell whether two operations are concurrent or whether one happened before another.

Scenario

Let’s begin with a database with only one replica. (so we can simplify this and thereafter generalise the approach to a leaderless database with multiple replicas)

Say we have two clients concurrently adding items to the same shopping cart
- (If this example is too mundane, let’s imagine two air traffic controllers) ✈️
  1. Initially the cart is empty between them
  2. The clients make 5 writes to the database
  3. Client 1 adds milk to the cart
    - This is the first write to that key so the server successfully stores and assigns it version 1
    - The server also echos the value back to the client the version number
  4. Client 2 adds eggs to the cart
    - Not knowing that client 1 concurrently added milk
    - And so client 2 only thinks there is eggs in the cart
    - The server assigns version 2 to this write
      - And assigns eggs and milk as two separate values
      - It then returns both values to the client along with the version number of 2
  5. Client 1 is oblivious to client 2 write, and want to add flour to the cart
    - So it thinks its current cart contents should be milk and flour
    - It then sends this value to the server along with the version number 1
      - That the server gave client 1 previously
  6. The server can tell from the version number that the write of milk and flour, super-seeds the prior value of milk
    - But this is concurrent with eggs
  7. Thus the server assigns version 3 to milk and flour
    - Overrides the version 1 value milk
    - But keeps the version 2 value
      - And returns both remaining values to the client
  8. Meanwhile client 2 wants to add ham to the cart
    - Unaware client 1 has added flour
  9. Client 2 received two separate values milk and eggs from the server in the last response
  10. So the client now merges those values and adds ham to form a new value
    - New value: (eggs, milk and ham)
    - It sends that value to the server along with the previous version number 2
  11. The server detects that version 2 overrides eggs
    - But it is concurrent with milk and flour
  12. So the two remaining values are:
    - Milk and flour with version 3
    - Eggs, milk and ham version 4
  13. Finally client 1 wants to add bacon
    - It previously received two values, milk and flour and separately eggs from the server at version 3
    - So it merges those, adds bacon and sends the final value of milk, flour eggs and bacon to the server
    - Along with the version number 3
    - This overrides milk and flour
    - Eggs was already overwritten in the last step
    - But it is concurrent with eggs milk and ham
    - So the server keeps those two concurrent values

What just happend?

In this example above, the clients are never fully up to date with the data on the server.

This is because there is always another operation going on concurrently
But old value versions of the value do get overwritten eventually and no writes are lost
⚠️ Note – that the server can determine whether two operations are concurrent or not by looking at the version numbers
- It does not need to interpret the value itself (so the value can be any data structure)

What does the algorithm look like?

The server maintains the version number for every key
Increments the version number each time that the key is written
And stores the new version number along with the new value written
When a client reads a key:
- The client returns all values that have not been overwritten, also the latest version number
Client must read a key before a write
When a client writes a key it must include a version number of the prior read
- And it must merge together all values that were received in the prior read
The response for a write request can be like a read returning all current values
- Which allows us to chain several writes like in a shopping cart example
When a server receives a write for a particular version number
- It can overwrite all values with that version number or below…
  - This is because it knows it has been merged into the new value
  - But it must keep all version numbers with higher value, because those values are concurrent with the incoming write
When a write includes a version number from the prior read, this tells us which previous state a write is based on
If you make a write without including a version number it is concurrent with all other writes
- So it will not overwrite anything so it will just be returned as the values from subsequent reads

4 thoughts on “What is the Capturing the happens before relationship? – Algorithm”

Designing Data Intensive Systems: Version Vectors – Algorithm – Scalable Human says:

7th January 2022 at 10:25 pm

[…] upon the previous blog post regarding Capturing the happens before relationship, the scenario that is described only uses a single […]

LikeLike

What is the Happened-Before Relationship and Concurrency? – Scalable Human says:

4th March 2022 at 2:19 pm

[…] my next blog post, I will be detailing the capturing of happen-before relationship into algorithm as the author Martin Kleppman in Designing Data-Intensive Applications desribes, […]

LikeLike

What is Replication? – Scalable Human says:

4th April 2022 at 5:29 pm

[…] Capturing the happens before relationship […]

LikeLike

What is Replication and Why is it Important? – Scalable Human says:

4th April 2022 at 10:27 pm

[…] Capturing the happens before relationship […]

LikeLike

Scalable Human Blog

Software Engineering

What is the Capturing the happens before relationship? – Algorithm

Scenario

What just happend?

What does the algorithm look like?

4 thoughts on “What is the Capturing the happens before relationship? – Algorithm”

Leave a comment Cancel reply

Scenario

What just happend?

What does the algorithm look like?

Share this:

Related

4 thoughts on “What is the Capturing the happens before relationship? – Algorithm”

Leave a comment Cancel reply