What are Version Vectors? – Algorithm

Expanding upon the previous blog post regarding Capturing the happens before relationship, the scenario that is described only uses a single replica

The question is, how does the algorithm change when there are multiple replicas but no leader? (leaderless replication)

  • Using a single version number is not enough
    • Why?
      • There are multiple replicas accepting writes concurrently!
    • What do we do?
      • Instead we need to use a version number per replica as well as per key
    • How?
      • Each replica increments its own version number when processing a write.
      • It also keeps track of the version number it has seen from each of the other replicas
      • This info, indicates:
        • which value to overwrite
        • which value to keep as siblings

A collection of version numbers from all the replicas is called the version vector

Martin Kleppman

To note, there are many variations of version vectors… although as Martin Kleppman highlights dotted version vector is potentially one of the most interesting.

Dotted Version Vectors

For instance, Riak 2.0 actually uses this. Similar to the previous post for the shopping cart example:

  • Vectors are sent from the database replicas to clients when values are read
  • And these need to be sent back to the database when a value is subsequently written
  • Riak encodes the version vector as a string, which it labels as casual context
  • The version vector allows the database to distinguish between overwrites and concurrent writes

Moreover, like in the single replica example:

  • The application may require the need to merge siblings
  • The version vector structure ensures that it is safe to read from one replica
    • And then subsequently write back to another replica
    • Doing so, may result in the siblings being created, but no data is lossed as long as siblings are merged correctly

To Note…

A version vector is sometimes referred to as a vector clog (even though they are not quite the same). The difference is subtle… but in brief when comparing the state of replicas, the version vectors are the right data structure to use.

📚 Further Reading & Related Topics

If you found this exploration of version vectors insightful, you’ll likely find these related articles helpful for broadening your understanding of system architecture and data consistency:

• Refactoring: Enhancing Code Design for Optimal Performance – Explore practical strategies for gradually improving the maintainability and clarity of your software systems through refactoring.

• Distributed Data-Intensive Systems: Logical Log Replication – Delve deeper into maintaining consistency and resolving conflicts in distributed systems through logical logging techniques.

3 responses to “What are Version Vectors? – Algorithm”

  1. How to Resolve Conflicts with Avoiding Conflicts? – Scalable Human Blog Avatar

    […] • Designing Data-Intensive Systems: Version Vector Algorithm Explained – Deepen your understanding of algorithms used to track changes and resolve conflicts effectively across distributed nodes. […]

    Like

  2. Java 25: Optimizing Compact Object Headers for Efficient Data Storage – Scalable Human Blog Avatar

    […] of optimizing object headers, such as reducing memory overhead and improving access speed. • Designing Data-Intensive Systems: Version Vectors Algorithm – Offers foundational knowledge in data-intensive system design, which is crucial when optimizing […]

    Like

  3. Avoid IoT Pitfalls: Key Lessons from Building Resilient Energy Systems – Scalable Human Blog Avatar

    […] and energy monitoring system development, these related articles will provide deeper insights: • Designing Data-Intensive Systems: Version Vectors Algorithm – This article explores version vectors, a key concept in distributed systems that can help […]

    Like

Leave a comment

I’m Sean

Welcome to the Scalable Human blog. Just a software engineer writing about algo trading, AI, and books. I learn in public, use AI tools extensively, and share what works. Educational purposes only – not financial advice.

Let’s connect