What is the Capturing the happens before relationship? – Algorithm

In this post, we will delve into an algorithm that can tell whether two operations are concurrent or whether one happened before another.

Scenario

Let’s begin with a database with only one replica. (so we can simplify this and thereafter generalise the approach to a leaderless database with multiple replicas)

  • Say we have two clients concurrently adding items to the same shopping cart
    • (If this example is too mundane, let’s imagine two air traffic controllers) ✈️
      1. Initially the cart is empty between them
      2. The clients make 5 writes to the database
      3. Client 1 adds milk to the cart
        • This is the first write to that key so the server successfully stores and assigns it version 1
        • The server also echos the value back to the client the version number
      4. Client 2 adds eggs to the cart
        • Not knowing that client 1 concurrently added milk
        • And so client 2 only thinks there is eggs in the cart
        • The server assigns version 2 to this write
          • And assigns eggs and milk as two separate values
          • It then returns both values to the client along with the version number of 2
      5. Client 1 is oblivious to client 2 write, and want to add flour to the cart
        • So it thinks its current cart contents should be milk and flour
        • It then sends this value to the server along with the version number 1
          • That the server gave client 1 previously
      6. The server can tell from the version number that the write of milk and flour, super-seeds the prior value of milk
        • But this is concurrent with eggs
      7. Thus the server assigns version 3 to milk and flour
        • Overrides the version 1 value milk
        • But keeps the version 2 value
          • And returns both remaining values to the client
      8. Meanwhile client 2 wants to add ham to the cart
        • Unaware client 1 has added flour
      9. Client 2 received two separate values milk and eggs from the server in the last response
      10. So the client now merges those values and adds ham to form a new value
        • New value: (eggs, milk and ham)
        • It sends that value to the server along with the previous version number 2
      11. The server detects that version 2 overrides eggs
        • But it is concurrent with milk and flour
      12. So the two remaining values are:
        • Milk and flour with version 3
        • Eggs, milk and ham version 4
      13. Finally client 1 wants to add bacon
        • It previously received two values, milk and flour and separately eggs from the server at version 3
        • So it merges those, adds bacon and sends the final value of milk, flour eggs and bacon to the server
        • Along with the version number 3
        • This overrides milk and flour
        • Eggs was already overwritten in the last step
        • But it is concurrent with eggs milk and ham
        • So the server keeps those two concurrent values

What just happend?

In this example above, the clients are never fully up to date with the data on the server.

  • This is because there is always another operation going on concurrently
  • But old value versions of the value do get overwritten eventually and no writes are lost
  • ⚠️ Note – that the server can determine whether two operations are concurrent or not by looking at the version numbers
    • It does not need to interpret the value itself (so the value can be any data structure)

What does the algorithm look like?

  • The server maintains the version number for every key
  • Increments the version number each time that the key is written
  • And stores the new version number along with the new value written
  • When a client reads a key:
    • The client returns all values that have not been overwritten, also the latest version number
  • Client must read a key before a write
  • When a client writes a key it must include a version number of the prior read
    • And it must merge together all values that were received in the prior read
  • The response for a write request can be like a read returning all current values
    • Which allows us to chain several writes like in a shopping cart example
  • When a server receives a write for a particular version number
    • It can overwrite all values with that version number or below…
      • This is because it knows it has been merged into the new value
      • But it must keep all version numbers with higher value, because those values are concurrent with the incoming write
  • When a write includes a version number from the prior read, this tells us which previous state a write is based on
  • If you make a write without including a version number it is concurrent with all other writes
    • So it will not overwrite anything so it will just be returned as the values from subsequent reads

4 thoughts on “What is the Capturing the happens before relationship? – Algorithm

Leave a comment