Notes/Consistency Models: From Strong to Eventual

Consistency Models: From Strong to Eventual

Understanding the spectrum of consistency models in distributed systems, from linearizability and sequential consistency to causal and eventual consistency, and the trade-offs each model makes.

2021-02-23AI-Synthesized from Personal Notes

Distributed SystemsConsistency

Terminology

Consistency model: a contract between a distributed system and its clients that defines which values a read operation may return after a write, and in what order operations appear to execute
Linearizability: the strongest consistency model; every operation appears to take effect atomically at some point between its invocation and its response, and all operations are ordered consistently with real time
Sequential consistency: all operations appear to execute in some sequential order, and each process's operations appear in the order they were issued, but the global order need not respect real-time ordering
Causal consistency: operations that are causally related (one depends on the result of another) are seen in the same order by all nodes; concurrent (causally unrelated) operations may be seen in different orders
Eventual consistency: if no new writes occur, all replicas will eventually converge to the same value; there is no bound on how long convergence takes
Strong consistency: an informal term often used interchangeably with linearizability; every read returns the most recent write
Read-your-writes consistency: after a client writes a value, that same client's subsequent reads will always see the written value or something newer
Monotonic reads: if a client reads a value at time $t_1$, any subsequent read at $t_2 > t_1$ will return the same value or a newer one, never an older one
Monotonic writes: writes by a single client are applied in the order they were issued; a later write is never applied before an earlier write from the same client
Happens-before: a partial ordering of events in a distributed system; event $A$ happens before event $B$ if $A$ could have influenced $B$ (same process order, or message send before receive)
Concurrent events: two events are concurrent if neither happens before the other; the system cannot determine which occurred first
Stale read: a read that returns an outdated value because the replica has not yet received the latest write
Total order: an ordering where every pair of operations is comparable; one definitively comes before the other
Partial order: an ordering where some pairs of operations are comparable (causally related) and others are not (concurrent)

What & Why

When data is replicated across multiple nodes, a fundamental question arises: what does a client see when it reads? If a write just happened on node A, does a read on node B see it immediately, eventually, or never until some condition is met?

Consistency models answer this question by defining the rules of the game. They are a contract: the system promises certain behaviors, and the application can rely on those promises. Stronger models (like linearizability) are easier to program against because they behave like a single-node system. Weaker models (like eventual consistency) are harder to reason about but allow the system to be faster and more available.

The trade-off is real. Linearizability requires coordination between nodes on every operation, which adds latency and reduces availability during network partitions. Eventual consistency requires no coordination, so it is fast and always available, but the application must handle the possibility of reading stale or conflicting data.

Most real systems do not pick one extreme. They offer tunable consistency or use different models for different operations. Understanding the full spectrum lets you choose the right model for each part of your application.

How It Works

The Consistency Spectrum

Linearizability

The gold standard. Every operation appears to happen at a single, atomic point in time between when the client sends the request and receives the response. If client A's write completes before client B's read begins (in real time), client B is guaranteed to see client A's write.

Linearizability requires coordination. Typically, a consensus protocol (Raft, Paxos) or synchronized clocks (Google's TrueTime) enforce the ordering. This adds at least one round-trip of latency to every operation and makes the system unavailable during network partitions (per the CAP theorem).

Sequential Consistency

Slightly weaker than linearizability. All operations appear to execute in some total order, and each client's operations appear in the order they were issued. However, the global order does not need to match real-time ordering. Two clients might disagree about which of two concurrent writes happened first, as long as they both see the same order.

The practical difference: with linearizability, if you call your friend on the phone and say "I just updated the record," your friend is guaranteed to see the update. With sequential consistency, there is a window where your friend might not see it yet, even though your write has completed.

Causal Consistency

Only causally related operations must be seen in the same order by all nodes. If operation A causes operation B (e.g., A writes a value that B reads), then every node sees A before B. But if A and B are concurrent (neither caused the other), different nodes may see them in different orders.

Causal consistency is attractive because it is the strongest model that does not require synchronous coordination. It can be implemented with vector clocks or dependency tracking, and it remains available during network partitions.

Eventual Consistency

The weakest useful model. The system only guarantees that if writes stop, all replicas will eventually converge to the same state. There is no bound on how long this takes, and during convergence, different clients may see different values.

Eventual consistency is the default for many distributed databases (Cassandra, DynamoDB with default settings) because it allows maximum availability and minimum latency. The application must be designed to tolerate stale reads and potential conflicts.

Session Guarantees

Between causal and eventual consistency, several session-level guarantees help applications cope with weak consistency:

Read-your-writes: After you write a value, your subsequent reads always see it. Other clients may not see it yet, but you always see your own writes.

Monotonic reads: Once you read a value, you never see an older value in a subsequent read. Your view of the data only moves forward.

Monotonic writes: Your writes are applied in the order you issued them. If you write A then B, no replica applies B before A.

Complexity Analysis

Model	Coordination Cost	Availability	Read Latency
Linearizable	$O(n)$ messages per op	Unavailable during partition	$O(\text{RTT} \times \lceil n/2 \rceil)$
Sequential	$O(n)$ messages per op	Unavailable during partition	$O(\text{RTT})$
Causal	$O(1)$ metadata per op	Available during partition	$O(1)$ local
Read-your-writes	$O(1)$ session tracking	Available during partition	$O(1)$ with sticky sessions
Eventual	$O(0)$ no coordination	Always available	$O(1)$ local

For linearizability with a quorum of $n$ nodes, the minimum number of messages per read or write:

$\text{Messages per operation} \geq 2 \times \lceil n/2 \rceil + 1$

This accounts for the request to a quorum and the quorum's responses. In practice, Raft-based systems require the leader to contact a majority for every write.

For causal consistency, each operation carries a vector clock of size $n$ (number of nodes):

$\text{Metadata overhead per operation} = O(n)$

This can be compressed using techniques like dotted version vectors, reducing the practical overhead to $O(k)$ where $k$ is the number of concurrent writers.

Convergence time for eventual consistency under continuous writes with replication lag $\delta$:

$\text{Staleness bound} \leq \delta + \text{propagation delay}$

Implementation

ALGORITHM LinearizableRead(key, replicas, quorum)
INPUT: key: data key, replicas: list of N nodes, quorum: majority count
OUTPUT: value
BEGIN
  // Phase 1: Read from quorum
  responses <- empty list
  FOR EACH replica IN replicas DO
    response <- SEND_READ(replica, key)
    IF response is successful THEN
      APPEND response TO responses
    END IF
    IF LENGTH(responses) >= quorum THEN
      BREAK
    END IF
  END FOR

  IF LENGTH(responses) < quorum THEN
    RETURN error("quorum not reached")
  END IF

  latest <- response IN responses WITH MAX(timestamp)

  // Phase 2: Write-back to ensure linearizability
  // (ensures a subsequent read from any quorum sees this value)
  writeBackCount <- 0
  FOR EACH replica IN replicas DO
    response <- SEND_WRITE(replica, key, latest.value, latest.timestamp)
    IF response is successful THEN
      writeBackCount <- writeBackCount + 1
    END IF
    IF writeBackCount >= quorum THEN
      BREAK
    END IF
  END FOR

  RETURN latest.value
END

ALGORITHM CausalConsistencyWrite(node, key, value, causalContext)
INPUT: node: this node, key: data key, value: data to write, causalContext: vector clock from client
OUTPUT: updated vector clock
BEGIN
  // Merge client's causal context with node's clock
  FOR EACH nodeID IN causalContext DO
    node.vectorClock[nodeID] <- MAX(node.vectorClock[nodeID], causalContext[nodeID])
  END FOR

  // Increment this node's entry
  node.vectorClock[node.id] <- node.vectorClock[node.id] + 1

  // Store the value with the vector clock
  node.store.put(key, value, COPY(node.vectorClock))

  // Replicate asynchronously to peers
  FOR EACH peer IN node.peers DO
    ASYNC_SEND_REPLICATE(peer, key, value, COPY(node.vectorClock))
  END FOR

  RETURN COPY(node.vectorClock)
END

ALGORITHM CausalConsistencyRead(node, key, causalContext)
INPUT: node: this node, key: data key, causalContext: client's vector clock
OUTPUT: value and updated context, or wait
BEGIN
  (storedValue, storedClock) <- node.store.get(key)

  // Check if this node has seen all writes the client depends on
  FOR EACH nodeID IN causalContext DO
    IF node.vectorClock[nodeID] < causalContext[nodeID] THEN
      // This node is behind; wait for replication or redirect
      WAIT_UNTIL node.vectorClock[nodeID] >= causalContext[nodeID]
    END IF
  END FOR

  RETURN (storedValue, storedClock)
END

ALGORITHM ReadYourWritesSession(client, replicas, key)
INPUT: client: client with session state, replicas: list of nodes, key: data key
OUTPUT: value
BEGIN
  lastWriteTimestamp <- client.session.lastWriteTimestamp[key]

  FOR EACH replica IN replicas DO
    response <- SEND_READ(replica, key)
    IF response.timestamp >= lastWriteTimestamp THEN
      RETURN response.value
    END IF
  END FOR

  // Fallback: read from the node that accepted the write
  response <- SEND_READ(client.session.lastWriteNode, key)
  RETURN response.value
END

ALGORITHM EventualConvergenceCheck(replicas, key)
INPUT: replicas: list of all replica nodes, key: data key
OUTPUT: converged (boolean)
BEGIN
  values <- empty set

  FOR EACH replica IN replicas DO
    response <- SEND_READ(replica, key)
    ADD response.value TO values
  END FOR

  IF SIZE(values) = 1 THEN
    RETURN TRUE   // all replicas agree
  ELSE
    RETURN FALSE  // still divergent
  END IF
END

Real-World Applications

Google Spanner: provides linearizability using TrueTime (GPS and atomic clocks) to assign globally consistent timestamps, enabling strong consistency across datacenters with bounded clock uncertainty
Apache Cassandra: defaults to eventual consistency but supports tunable consistency levels per query (ONE, QUORUM, ALL), letting applications choose the trade-off per operation
Amazon DynamoDB: offers eventual consistency by default for reads, with an option for strongly consistent reads that route to the leader replica at higher latency
MongoDB causal sessions: provides causal consistency within a client session, ensuring read-your-writes, monotonic reads, and monotonic writes without requiring global coordination
CockroachDB: implements serializable isolation (equivalent to linearizability for transactions) using a combination of hybrid logical clocks and Raft-based replication
Redis: single-node Redis is linearizable; Redis Cluster with asynchronous replication provides eventual consistency, with the risk of losing acknowledged writes if the primary fails before replication

Key Takeaways

Consistency models define what values a read may return after a write; stronger models are easier to program against but require more coordination
Linearizability is the strongest model: operations appear atomic and respect real-time ordering, but it requires consensus and is unavailable during partitions
Sequential consistency relaxes real-time ordering but still provides a total order; causal consistency only orders causally related operations
Eventual consistency provides no ordering guarantees during operation but promises convergence when writes stop; it enables maximum availability and minimum latency
Session guarantees (read-your-writes, monotonic reads) provide useful middle ground between causal and eventual consistency
Most production systems offer tunable consistency, letting applications choose the right model per operation based on their specific requirements

2021-02-24

Distributed Transactions: Coordinating Writes Across Services

How distributed systems coordinate multi-node writes using two-phase commit, three-phase commit, and the Saga pattern with compensating transactions.

2021-02-25

Vector Clocks and Conflict Resolution: Tracking Causality in Distributed Systems

How vector clocks, version vectors, and CRDTs track causality and resolve conflicts in distributed systems where there is no global clock.

2021-02-26

Pessimistic vs Optimistic Locking: Controlling Concurrent Access

How pessimistic locking (SELECT FOR UPDATE) and optimistic locking (version columns, CAS-style updates) prevent race conditions in concurrent systems, when each strategy wins, and the trade-offs in throughput, latency, and deadlock risk.