Vector Clocks and Conflict Resolution: Tracking Causality in Distributed Systems
How vector clocks, version vectors, and CRDTs track causality and resolve conflicts in distributed systems where there is no global clock.
Terminology
- Logical clock: a mechanism for ordering events in a distributed system without relying on synchronized physical clocks; assigns a number to each event that respects causality
- Lamport timestamp: a single integer counter that increments on each event; if event $A$ happens before event $B$, then $L(A) < L(B)$, but the converse is not necessarily true
- Vector clock: an array of $n$ counters (one per node) that precisely captures the happens-before relationship; if $VC(A) < VC(B)$ component-wise, then $A$ causally precedes $B$
- Happens-before ($\rightarrow$): a partial order on events; $A \rightarrow B$ if $A$ and $B$ are on the same node and $A$ occurred first, or if $A$ is a message send and $B$ is the corresponding receive
- Concurrent events: two events $A$ and $B$ are concurrent ($A \| B$) if neither $A \rightarrow B$ nor $B \rightarrow A$; the system cannot determine which happened first
- Version vector: similar to a vector clock but tracks versions of a data item rather than events; each entry records the latest version number from each node that has written to the item
- Conflict: a situation where two or more concurrent writes modify the same data item, producing divergent versions that must be reconciled
- Last-writer-wins (LWW): a conflict resolution strategy that keeps the write with the highest timestamp and discards others; simple but can lose data
- CRDT (Conflict-free Replicated Data Type): a data structure designed so that concurrent updates can always be merged automatically without conflicts, guaranteeing eventual convergence
- State-based CRDT (CvRDT): a CRDT where replicas periodically exchange their full state and merge using a join operation on a semilattice
- Operation-based CRDT (CmRDT): a CRDT where replicas exchange operations (not state) and apply them; operations must be commutative
- Semilattice: a mathematical structure with a join (least upper bound) operation that is commutative, associative, and idempotent; the foundation of state-based CRDTs
- Sibling: in systems like Riak, when concurrent writes create conflicting versions, all versions are kept as siblings until the application or an automatic resolver picks one
- Dotted version vector: an optimized version vector that uses "dots" (node, counter pairs) to precisely track individual write events, reducing false conflicts
What & Why
In a single-node system, ordering events is trivial: the CPU has one clock, and operations execute in sequence. In a distributed system, there is no global clock. Each node has its own clock, and those clocks drift. You cannot reliably determine which of two events on different nodes happened first by comparing wall-clock timestamps.
This matters because many operations depend on ordering. If two clients update the same record, you need to know which update came last (or whether they were concurrent and need merging). If a message is sent before another, the receiver should process them in the correct order.
Lamport timestamps provide a partial solution: a single counter that respects causality. But they cannot distinguish between causally related and concurrent events. If L(A) < L(B), you know A might have happened before B, but you cannot be sure.
Vector clocks solve this completely. By maintaining a counter per node, they capture the full causal history of each event. Two events can be compared precisely: one happened before the other, or they are concurrent. This enables accurate conflict detection in replicated systems.
Once conflicts are detected, they must be resolved. Last-writer-wins is simple but lossy. Application-level merge functions are flexible but complex. CRDTs are data structures that sidestep the problem entirely by ensuring that concurrent operations can always be merged automatically, with mathematical guarantees of convergence.
How It Works
Lamport Timestamps
Each node maintains a counter C. On every local event, increment C. When sending a message, attach C. When receiving a message with timestamp T, set C = \max(C, T) + 1.
This guarantees: if A \rightarrow B, then L(A) < L(B). But the converse does not hold. Two concurrent events may have L(A) < L(B) without A happening before B.
Vector Clocks
Each node i maintains a vector VC[0..n-1] where VC[j] represents the latest event count node i knows about from node j.
Rules:
- On a local event at node
i: incrementVC[i] - When sending a message from node
i: incrementVC[i], attachVCto the message - When receiving a message with vector
VC_{msg}at nodei: for eachj, setVC[j] = \max(VC[j], VC_{msg}[j]), then incrementVC[i]
Comparison:
VC(A) \leq VC(B)if every component ofAis\leqthe corresponding component ofBA \rightarrow Bif and only ifVC(A) < VC(B)(at least one component is strictly less)A \| B(concurrent) if neitherVC(A) \leq VC(B)norVC(B) \leq VC(A)
Conflict Detection with Version Vectors
When a client writes to a replica, the replica increments its entry in the version vector and stores the new value. When replicas sync, they compare version vectors:
- If one vector dominates the other (every component
\geq), the dominated version is older and can be discarded. - If neither dominates, the writes are concurrent and a conflict exists. The system must either keep both versions (siblings) or merge them.
Conflict Resolution Strategies
Last-Writer-Wins (LWW): Attach a wall-clock timestamp to each write. When conflicts arise, keep the write with the highest timestamp. This is simple and always converges, but it silently discards concurrent writes. Cassandra uses LWW by default.
Application-level merge: The system presents all conflicting versions to the application, which decides how to merge them. Amazon's shopping cart (Dynamo) used this approach: the application takes the union of items from all conflicting cart versions.
CRDTs: Data structures where the merge operation is built into the type itself. A G-Counter (grow-only counter) maintains a counter per node; the merged value is the component-wise maximum, and the total is the sum. A G-Set (grow-only set) merges via set union. More complex CRDTs like OR-Set (observed-remove set) handle both additions and removals.
CRDT Example: G-Counter
Each node i maintains its own counter c_i. To increment, node i increments c_i. The total count is \sum_{i} c_i. To merge two replicas, take the component-wise maximum: \text{merged}[i] = \max(A[i], B[i]).
This merge is commutative (A \sqcup B = B \sqcup A), associative ((A \sqcup B) \sqcup C = A \sqcup (B \sqcup C)), and idempotent (A \sqcup A = A). These three properties guarantee convergence regardless of message ordering or duplication.
Complexity Analysis
| Mechanism | Space per Event | Comparison | Detects Concurrency? |
|---|---|---|---|
| Lamport timestamp | $O(1)$ | $O(1)$ | No |
| Vector clock | $O(n)$ | $O(n)$ | Yes |
| Dotted version vector | $O(n)$ worst case | $O(n)$ | Yes (fewer false conflicts) |
| G-Counter CRDT | $O(n)$ | $O(n)$ merge | N/A (conflict-free) |
For a system with $n$ nodes, the vector clock size grows linearly with the number of nodes:
With 64-bit counters and 100 nodes, each vector clock is 800 bytes. This is attached to every message and stored with every data version, so it becomes a significant overhead at scale.
Comparing two vector clocks requires checking all $n$ components:
For CRDTs, the merge operation cost depends on the data structure. For a G-Counter with $n$ nodes:
For an OR-Set with $m$ elements and $n$ nodes:
Implementation
ALGORITHM LamportTimestamp(node)
STATE: counter: integer, initialized to 0
OPERATION LocalEvent()
BEGIN
counter <- counter + 1
RETURN counter
END
OPERATION SendMessage(destination)
BEGIN
counter <- counter + 1
SEND(destination, message, counter)
END
OPERATION ReceiveMessage(senderTimestamp)
BEGIN
counter <- MAX(counter, senderTimestamp) + 1
RETURN counter
END
ALGORITHM VectorClockUpdate(nodeIndex, vectorClock, numNodes)
STATE: vectorClock: array of integers, size numNodes, initialized to all zeros
OPERATION LocalEvent()
BEGIN
vectorClock[nodeIndex] <- vectorClock[nodeIndex] + 1
RETURN COPY(vectorClock)
END
OPERATION SendMessage(destination)
BEGIN
vectorClock[nodeIndex] <- vectorClock[nodeIndex] + 1
SEND(destination, message, COPY(vectorClock))
END
OPERATION ReceiveMessage(senderVC)
BEGIN
FOR i <- 0 TO numNodes - 1 DO
vectorClock[i] <- MAX(vectorClock[i], senderVC[i])
END FOR
vectorClock[nodeIndex] <- vectorClock[nodeIndex] + 1
RETURN COPY(vectorClock)
END
ALGORITHM CompareVectorClocks(vcA, vcB, n)
INPUT: vcA, vcB: vector clocks of size n
OUTPUT: BEFORE, AFTER, CONCURRENT, or EQUAL
BEGIN
aLessOrEqual <- TRUE
bLessOrEqual <- TRUE
equal <- TRUE
FOR i <- 0 TO n - 1 DO
IF vcA[i] < vcB[i] THEN
bLessOrEqual <- FALSE
equal <- FALSE
ELSE IF vcA[i] > vcB[i] THEN
aLessOrEqual <- FALSE
equal <- FALSE
END IF
END FOR
IF equal THEN
RETURN EQUAL
ELSE IF aLessOrEqual THEN
RETURN BEFORE // A happened before B
ELSE IF bLessOrEqual THEN
RETURN AFTER // A happened after B
ELSE
RETURN CONCURRENT
END IF
END
ALGORITHM GCounterCRDT(nodeIndex, numNodes)
STATE: counts: array of integers, size numNodes, initialized to all zeros
OPERATION Increment()
BEGIN
counts[nodeIndex] <- counts[nodeIndex] + 1
END
OPERATION Value()
BEGIN
total <- 0
FOR i <- 0 TO numNodes - 1 DO
total <- total + counts[i]
END FOR
RETURN total
END
OPERATION Merge(otherCounts)
BEGIN
FOR i <- 0 TO numNodes - 1 DO
counts[i] <- MAX(counts[i], otherCounts[i])
END FOR
END
ALGORITHM ConflictResolutionLWW(versions)
INPUT: versions: list of (value, timestamp, nodeID) tuples
OUTPUT: winning value
BEGIN
best <- versions[0]
FOR i <- 1 TO LENGTH(versions) - 1 DO
IF versions[i].timestamp > best.timestamp THEN
best <- versions[i]
ELSE IF versions[i].timestamp = best.timestamp THEN
// Break ties deterministically by nodeID
IF versions[i].nodeID > best.nodeID THEN
best <- versions[i]
END IF
END IF
END FOR
RETURN best.value
END
Real-World Applications
- Amazon DynamoDB: uses vector clocks (and later, server-side timestamps) to detect conflicting writes across replicas; the original Dynamo paper described returning all conflicting versions to the application for resolution
- Riak: uses dotted version vectors to track causality and detect concurrent writes; conflicting versions are stored as siblings and can be resolved by the application or by built-in resolvers
- Redis CRDTs: Redis Enterprise supports CRDT-based data types (counters, sets, registers) for active-active geo-replication, allowing writes at any datacenter with automatic conflict-free merging
- Figma: uses CRDTs for real-time collaborative design editing, allowing multiple users to edit the same document simultaneously with automatic conflict resolution
- Apple Notes and iCloud: uses operational transformation and CRDT-like techniques to sync notes across devices, handling concurrent edits when devices are offline
- Cassandra: uses last-writer-wins with wall-clock timestamps as the default conflict resolution strategy; this is simple and performant but can lose concurrent writes
Key Takeaways
- Distributed systems cannot rely on physical clocks for ordering; logical clocks (Lamport timestamps, vector clocks) provide ordering guarantees based on causality
- Lamport timestamps are compact ($O(1)$) but cannot detect concurrent events; vector clocks are larger ($O(n)$) but precisely capture the happens-before relationship
- Two events are concurrent if neither vector clock dominates the other; concurrent writes to the same data create conflicts that must be resolved
- Last-writer-wins is the simplest resolution strategy but silently discards data; application-level merge gives full control but adds complexity
- CRDTs guarantee automatic convergence by designing data structures whose merge operations are commutative, associative, and idempotent
- The choice between vector clocks, LWW, and CRDTs depends on the application: LWW for simplicity, vector clocks for accurate conflict detection, CRDTs for conflict-free collaboration