Notes/Distributed Transactions: Coordinating Writes Across Services

Distributed Transactions: Coordinating Writes Across Services

How distributed systems coordinate multi-node writes using two-phase commit, three-phase commit, and the Saga pattern with compensating transactions.

2021-02-24AI-Synthesized from Personal Notes

Distributed SystemsTransactionsSagas

Terminology

Distributed transaction: a transaction that spans multiple nodes or services, requiring all participants to either commit or abort together to maintain data consistency
Atomicity: the guarantee that a transaction either completes entirely (all operations succeed) or has no effect at all (all operations are rolled back)
Coordinator: the node that orchestrates a distributed transaction by collecting votes from participants and deciding whether to commit or abort
Participant: a node that performs part of a distributed transaction and votes on whether it can commit its portion
Two-phase commit (2PC): a protocol where the coordinator first asks all participants to prepare (vote), then instructs them all to commit or abort based on the votes
Three-phase commit (3PC): an extension of 2PC that adds a pre-commit phase to reduce the window where participants are blocked waiting for the coordinator
Prepare (vote) phase: the first phase of 2PC where each participant writes its changes to a durable log and votes "yes" (can commit) or "no" (must abort)
Commit phase: the second phase of 2PC where the coordinator tells all participants to finalize their changes (if all voted yes) or roll back (if any voted no)
Blocking protocol: a protocol where participants may be stuck waiting indefinitely for the coordinator's decision if the coordinator crashes; 2PC is a blocking protocol
Saga: a sequence of local transactions where each step has a corresponding compensating transaction that undoes its effect if a later step fails
Compensating transaction: a transaction that semantically reverses the effect of a previously committed local transaction; it does not literally undo the operation but applies a corrective action
Choreography: a Saga coordination style where each service listens for events and decides independently what to do next, with no central coordinator
Orchestration: a Saga coordination style where a central orchestrator directs each service to execute its step and handles failures by triggering compensations
Idempotency: the property that executing an operation multiple times produces the same result as executing it once; critical for retry safety in distributed transactions
In-doubt transaction: a transaction that has voted "yes" in the prepare phase but has not yet received the coordinator's commit or abort decision

What & Why

A single database transaction is straightforward: the database guarantees atomicity using its local write-ahead log. But when a business operation spans multiple databases or services, no single database can guarantee atomicity for the whole operation.

Consider an e-commerce order: the order service creates the order, the payment service charges the card, and the inventory service reserves the items. If the payment succeeds but the inventory reservation fails, you need to refund the payment. Without coordination, you risk inconsistent state: money charged but no items reserved, or items reserved but no payment.

Distributed transactions solve this coordination problem. Two-phase commit (2PC) is the classic approach: a coordinator ensures all participants agree before any of them commit. It provides strong atomicity but has significant drawbacks: it blocks if the coordinator crashes, it holds locks across all participants for the duration of the protocol, and it does not scale well across services with different availability requirements.

The Saga pattern is the modern alternative for microservices. Instead of locking everything and committing atomically, a Saga breaks the operation into a sequence of local transactions. Each step commits independently. If a later step fails, the Saga runs compensating transactions to undo the effects of earlier steps. This trades strong atomicity for availability and loose coupling.

How It Works

Two-Phase Commit (2PC)

Phase 1 (Prepare): The coordinator sends a "prepare" message to all participants. Each participant executes the transaction locally (without committing), writes the changes to a durable log, and responds with "yes" (ready to commit) or "no" (must abort). Once a participant votes "yes," it has promised to commit if told to do so, and it cannot unilaterally abort.

Phase 2 (Commit/Abort): If all participants voted "yes," the coordinator writes a "commit" decision to its own durable log and sends "commit" to all participants. Each participant then finalizes its changes. If any participant voted "no," the coordinator sends "abort" and all participants roll back.

The blocking problem: If the coordinator crashes after participants have voted "yes" but before sending the commit/abort decision, participants are stuck. They have promised to commit but do not know the decision. They cannot abort (they promised) and cannot commit (they do not know if everyone voted yes). They must wait for the coordinator to recover, holding locks the entire time.

Three-Phase Commit (3PC)

3PC adds a "pre-commit" phase between the vote and the final commit. After receiving all "yes" votes, the coordinator sends a "pre-commit" message. Participants acknowledge it. Only then does the coordinator send the final "commit." If the coordinator crashes during pre-commit, participants can safely abort because they know the coordinator had not yet committed. This reduces the blocking window but does not eliminate it entirely in the presence of network partitions.

Saga Pattern

A Saga replaces a single distributed transaction with a sequence of local transactions, each with a compensating action:

Order Service: Create order (compensate: cancel order)
Payment Service: Charge payment (compensate: refund payment)
Inventory Service: Reserve items (compensate: release reservation)

If step 3 fails, the Saga runs compensations in reverse order: refund payment, then cancel order. Each step commits independently, so there are no distributed locks.

Orchestration: A central Saga orchestrator tells each service what to do and tracks the state. If a step fails, the orchestrator triggers compensations. This is easier to reason about but creates a single point of coordination.

Choreography: Each service publishes events when it completes its step. The next service listens for the event and acts. If a service fails, it publishes a failure event, and upstream services listen for it and compensate. This is more decoupled but harder to debug and monitor.

Complexity Analysis

Protocol	Message Rounds	Messages	Blocking?
2PC	2 rounds	$4n$ ($n$ = participants)	Yes, if coordinator fails
3PC	3 rounds	$6n$	Reduced, not eliminated
Saga (orchestrated)	$2k$ rounds ($k$ = steps)	$2k$ messages	No (each step is local)

For 2PC with $n$ participants, the total number of messages in the happy path (all vote yes):

$\text{Messages}_{2PC} = n \text{ (prepare)} + n \text{ (votes)} + n \text{ (commit)} + n \text{ (acks)} = 4n$

The latency is dominated by two sequential round-trips:

$\text{Latency}_{2PC} = 2 \times \text{RTT}_{\text{max participant}}$

For a Saga with $k$ steps, the happy-path latency is the sum of all step latencies:

$\text{Latency}_{\text{Saga}} = \sum_{i=1}^{k} \text{latency}_i$

In the failure case, if step $j$ fails, the compensation latency is:

$\text{Compensation latency} = \sum_{i=1}^{j-1} \text{compensation\_latency}_i$

The lock duration comparison is significant. 2PC holds locks across all participants for the entire protocol duration. Sagas hold locks only for each individual local transaction:

$\text{Lock duration}_{2PC} = 2 \times \text{RTT} + \text{processing}$

$\text{Lock duration}_{\text{Saga step}} = \text{local transaction time only}$

Implementation

ALGORITHM TwoPhaseCommitCoordinator(participants, transaction)
INPUT: participants: list of participant nodes, transaction: the distributed transaction
OUTPUT: COMMIT or ABORT
BEGIN
  // Phase 1: Prepare
  LOG_TO_DISK("PREPARE", transaction.id)
  votes <- empty list

  FOR EACH participant IN participants DO
    response <- SEND_PREPARE(participant, transaction)
    APPEND response TO votes
  END FOR

  // Decision
  allYes <- TRUE
  FOR EACH vote IN votes DO
    IF vote != YES THEN
      allYes <- FALSE
      BREAK
    END IF
  END FOR

  IF allYes THEN
    LOG_TO_DISK("COMMIT", transaction.id)
    decision <- COMMIT
  ELSE
    LOG_TO_DISK("ABORT", transaction.id)
    decision <- ABORT
  END IF

  // Phase 2: Commit or Abort
  FOR EACH participant IN participants DO
    SEND_DECISION(participant, transaction.id, decision)
  END FOR

  // Wait for acknowledgments
  FOR EACH participant IN participants DO
    AWAIT_ACK(participant, transaction.id)
  END FOR

  LOG_TO_DISK("COMPLETE", transaction.id)
  RETURN decision
END

ALGORITHM TwoPhaseCommitParticipant(transaction, prepareRequest)
INPUT: transaction: transaction data, prepareRequest: prepare message from coordinator
OUTPUT: YES or NO vote
BEGIN
  TRY
    EXECUTE_LOCALLY(transaction)
    LOG_TO_DISK("PREPARED", transaction.id)
    RETURN YES
  CATCH error
    LOG_TO_DISK("ABORTED", transaction.id)
    ROLLBACK_LOCAL(transaction)
    RETURN NO
  END TRY
END

ALGORITHM SagaOrchestrator(steps)
INPUT: steps: list of { execute: function, compensate: function } pairs
OUTPUT: success or failure with compensations applied
BEGIN
  completedSteps <- empty list

  FOR i <- 0 TO LENGTH(steps) - 1 DO
    result <- EXECUTE(steps[i].execute)

    IF result is failure THEN
      // Compensate in reverse order
      FOR j <- LENGTH(completedSteps) - 1 DOWNTO 0 DO
        compensateResult <- EXECUTE(completedSteps[j].compensate)
        IF compensateResult is failure THEN
          LOG_ERROR("Compensation failed for step " + j)
          ALERT_OPERATOR("Manual intervention needed")
        END IF
      END FOR
      RETURN failure("Step " + i + " failed, compensations applied")
    END IF

    APPEND steps[i] TO completedSteps
  END FOR

  RETURN success
END

ALGORITHM SagaChoreographyHandler(event, localStep, compensateStep, nextEventType)
INPUT: event: incoming event, localStep: this service's transaction, compensateStep: compensation function, nextEventType: event to publish on success
OUTPUT: published event
BEGIN
  IF event.type = "EXECUTE" THEN
    result <- EXECUTE(localStep, event.data)
    IF result is success THEN
      PUBLISH_EVENT(nextEventType, result.data)
    ELSE
      PUBLISH_EVENT("COMPENSATION_NEEDED", event.data)
    END IF

  ELSE IF event.type = "COMPENSATION_NEEDED" THEN
    EXECUTE(compensateStep, event.data)
    PUBLISH_EVENT("COMPENSATED", event.data)
  END IF
END

ALGORITHM IdempotentExecution(operationId, operation, idempotencyStore)
INPUT: operationId: unique ID for this operation, operation: the function to execute, idempotencyStore: persistent store of completed operations
OUTPUT: result
BEGIN
  // Check if already executed
  existing <- idempotencyStore.get(operationId)
  IF existing is not NIL THEN
    RETURN existing.result  // return cached result
  END IF

  result <- EXECUTE(operation)
  idempotencyStore.put(operationId, result)
  RETURN result
END

Real-World Applications

Banking wire transfers: traditional banks use 2PC (via the XA protocol) to coordinate debits and credits across different database systems, ensuring that money is never created or destroyed
E-commerce order processing: platforms like Amazon use Sagas to coordinate order creation, payment, inventory reservation, and shipping; each service commits independently with compensations for failures
Google Spanner: uses a variant of 2PC combined with Paxos for fault tolerance; the coordinator's decision is replicated via Paxos so that coordinator failure does not block participants
Microservices choreography: event-driven architectures use Kafka or similar message brokers to implement choreography-based Sagas, where each service reacts to events and publishes its own
Travel booking systems: booking a trip involves reserving a flight, hotel, and car rental across different providers; Sagas with compensations (cancel flight, cancel hotel) handle partial failures
CockroachDB: implements distributed transactions using a parallel commit protocol that reduces the latency of 2PC by pipelining the prepare and commit phases

Key Takeaways

Distributed transactions ensure atomicity across multiple nodes or services: either all participants commit or all abort
Two-phase commit (2PC) provides strong atomicity but is a blocking protocol; if the coordinator crashes after participants vote "yes," participants are stuck holding locks until recovery
Three-phase commit (3PC) reduces the blocking window by adding a pre-commit phase, but does not fully solve the problem under network partitions
The Saga pattern replaces distributed locking with a sequence of local transactions and compensating actions, trading strong atomicity for availability and loose coupling
Orchestrated Sagas use a central coordinator for easier reasoning; choreographed Sagas use events for looser coupling but are harder to debug
Idempotency is essential for all distributed transaction patterns because network failures can cause retries, and each operation must be safe to execute multiple times

2021-02-25

Vector Clocks and Conflict Resolution: Tracking Causality in Distributed Systems

How vector clocks, version vectors, and CRDTs track causality and resolve conflicts in distributed systems where there is no global clock.

2021-02-26

Pessimistic vs Optimistic Locking: Controlling Concurrent Access

How pessimistic locking (SELECT FOR UPDATE) and optimistic locking (version columns, CAS-style updates) prevent race conditions in concurrent systems, when each strategy wins, and the trade-offs in throughput, latency, and deadlock risk.

2021-02-27

Distributed Locks and Why They Are Hard

How lease-based distributed locks work, why process pauses and clock skew make them dangerous, the Redis SETNX pattern, the Redlock algorithm and its controversy, and when to use distributed locks vs when to avoid them entirely.