Distributed Transactions: Coordinating Writes Across Services
How distributed systems coordinate multi-node writes using two-phase commit, three-phase commit, and the Saga pattern with compensating transactions.
Terminology
- Distributed transaction: a transaction that spans multiple nodes or services, requiring all participants to either commit or abort together to maintain data consistency
- Atomicity: the guarantee that a transaction either completes entirely (all operations succeed) or has no effect at all (all operations are rolled back)
- Coordinator: the node that orchestrates a distributed transaction by collecting votes from participants and deciding whether to commit or abort
- Participant: a node that performs part of a distributed transaction and votes on whether it can commit its portion
- Two-phase commit (2PC): a protocol where the coordinator first asks all participants to prepare (vote), then instructs them all to commit or abort based on the votes
- Three-phase commit (3PC): an extension of 2PC that adds a pre-commit phase to reduce the window where participants are blocked waiting for the coordinator
- Prepare (vote) phase: the first phase of 2PC where each participant writes its changes to a durable log and votes "yes" (can commit) or "no" (must abort)
- Commit phase: the second phase of 2PC where the coordinator tells all participants to finalize their changes (if all voted yes) or roll back (if any voted no)
- Blocking protocol: a protocol where participants may be stuck waiting indefinitely for the coordinator's decision if the coordinator crashes; 2PC is a blocking protocol
- Saga: a sequence of local transactions where each step has a corresponding compensating transaction that undoes its effect if a later step fails
- Compensating transaction: a transaction that semantically reverses the effect of a previously committed local transaction; it does not literally undo the operation but applies a corrective action
- Choreography: a Saga coordination style where each service listens for events and decides independently what to do next, with no central coordinator
- Orchestration: a Saga coordination style where a central orchestrator directs each service to execute its step and handles failures by triggering compensations
- Idempotency: the property that executing an operation multiple times produces the same result as executing it once; critical for retry safety in distributed transactions
- In-doubt transaction: a transaction that has voted "yes" in the prepare phase but has not yet received the coordinator's commit or abort decision
What & Why
A single database transaction is straightforward: the database guarantees atomicity using its local write-ahead log. But when a business operation spans multiple databases or services, no single database can guarantee atomicity for the whole operation.
Consider an e-commerce order: the order service creates the order, the payment service charges the card, and the inventory service reserves the items. If the payment succeeds but the inventory reservation fails, you need to refund the payment. Without coordination, you risk inconsistent state: money charged but no items reserved, or items reserved but no payment.
Distributed transactions solve this coordination problem. Two-phase commit (2PC) is the classic approach: a coordinator ensures all participants agree before any of them commit. It provides strong atomicity but has significant drawbacks: it blocks if the coordinator crashes, it holds locks across all participants for the duration of the protocol, and it does not scale well across services with different availability requirements.
The Saga pattern is the modern alternative for microservices. Instead of locking everything and committing atomically, a Saga breaks the operation into a sequence of local transactions. Each step commits independently. If a later step fails, the Saga runs compensating transactions to undo the effects of earlier steps. This trades strong atomicity for availability and loose coupling.
How It Works
Two-Phase Commit (2PC)
Phase 1 (Prepare): The coordinator sends a "prepare" message to all participants. Each participant executes the transaction locally (without committing), writes the changes to a durable log, and responds with "yes" (ready to commit) or "no" (must abort). Once a participant votes "yes," it has promised to commit if told to do so, and it cannot unilaterally abort.
Phase 2 (Commit/Abort): If all participants voted "yes," the coordinator writes a "commit" decision to its own durable log and sends "commit" to all participants. Each participant then finalizes its changes. If any participant voted "no," the coordinator sends "abort" and all participants roll back.
The blocking problem: If the coordinator crashes after participants have voted "yes" but before sending the commit/abort decision, participants are stuck. They have promised to commit but do not know the decision. They cannot abort (they promised) and cannot commit (they do not know if everyone voted yes). They must wait for the coordinator to recover, holding locks the entire time.
Three-Phase Commit (3PC)
3PC adds a "pre-commit" phase between the vote and the final commit. After receiving all "yes" votes, the coordinator sends a "pre-commit" message. Participants acknowledge it. Only then does the coordinator send the final "commit." If the coordinator crashes during pre-commit, participants can safely abort because they know the coordinator had not yet committed. This reduces the blocking window but does not eliminate it entirely in the presence of network partitions.
Saga Pattern
A Saga replaces a single distributed transaction with a sequence of local transactions, each with a compensating action:
- Order Service: Create order (compensate: cancel order)
- Payment Service: Charge payment (compensate: refund payment)
- Inventory Service: Reserve items (compensate: release reservation)
If step 3 fails, the Saga runs compensations in reverse order: refund payment, then cancel order. Each step commits independently, so there are no distributed locks.
Orchestration: A central Saga orchestrator tells each service what to do and tracks the state. If a step fails, the orchestrator triggers compensations. This is easier to reason about but creates a single point of coordination.
Choreography: Each service publishes events when it completes its step. The next service listens for the event and acts. If a service fails, it publishes a failure event, and upstream services listen for it and compensate. This is more decoupled but harder to debug and monitor.
Complexity Analysis
| Protocol | Message Rounds | Messages | Blocking? |
|---|---|---|---|
| 2PC | 2 rounds | $4n$ ($n$ = participants) | Yes, if coordinator fails |
| 3PC | 3 rounds | $6n$ | Reduced, not eliminated |
| Saga (orchestrated) | $2k$ rounds ($k$ = steps) | $2k$ messages | No (each step is local) |
For 2PC with $n$ participants, the total number of messages in the happy path (all vote yes):
The latency is dominated by two sequential round-trips:
For a Saga with $k$ steps, the happy-path latency is the sum of all step latencies:
In the failure case, if step $j$ fails, the compensation latency is:
The lock duration comparison is significant. 2PC holds locks across all participants for the entire protocol duration. Sagas hold locks only for each individual local transaction:
Implementation
ALGORITHM TwoPhaseCommitCoordinator(participants, transaction)
INPUT: participants: list of participant nodes, transaction: the distributed transaction
OUTPUT: COMMIT or ABORT
BEGIN
// Phase 1: Prepare
LOG_TO_DISK("PREPARE", transaction.id)
votes <- empty list
FOR EACH participant IN participants DO
response <- SEND_PREPARE(participant, transaction)
APPEND response TO votes
END FOR
// Decision
allYes <- TRUE
FOR EACH vote IN votes DO
IF vote != YES THEN
allYes <- FALSE
BREAK
END IF
END FOR
IF allYes THEN
LOG_TO_DISK("COMMIT", transaction.id)
decision <- COMMIT
ELSE
LOG_TO_DISK("ABORT", transaction.id)
decision <- ABORT
END IF
// Phase 2: Commit or Abort
FOR EACH participant IN participants DO
SEND_DECISION(participant, transaction.id, decision)
END FOR
// Wait for acknowledgments
FOR EACH participant IN participants DO
AWAIT_ACK(participant, transaction.id)
END FOR
LOG_TO_DISK("COMPLETE", transaction.id)
RETURN decision
END
ALGORITHM TwoPhaseCommitParticipant(transaction, prepareRequest)
INPUT: transaction: transaction data, prepareRequest: prepare message from coordinator
OUTPUT: YES or NO vote
BEGIN
TRY
EXECUTE_LOCALLY(transaction)
LOG_TO_DISK("PREPARED", transaction.id)
RETURN YES
CATCH error
LOG_TO_DISK("ABORTED", transaction.id)
ROLLBACK_LOCAL(transaction)
RETURN NO
END TRY
END
ALGORITHM SagaOrchestrator(steps)
INPUT: steps: list of { execute: function, compensate: function } pairs
OUTPUT: success or failure with compensations applied
BEGIN
completedSteps <- empty list
FOR i <- 0 TO LENGTH(steps) - 1 DO
result <- EXECUTE(steps[i].execute)
IF result is failure THEN
// Compensate in reverse order
FOR j <- LENGTH(completedSteps) - 1 DOWNTO 0 DO
compensateResult <- EXECUTE(completedSteps[j].compensate)
IF compensateResult is failure THEN
LOG_ERROR("Compensation failed for step " + j)
ALERT_OPERATOR("Manual intervention needed")
END IF
END FOR
RETURN failure("Step " + i + " failed, compensations applied")
END IF
APPEND steps[i] TO completedSteps
END FOR
RETURN success
END
ALGORITHM SagaChoreographyHandler(event, localStep, compensateStep, nextEventType)
INPUT: event: incoming event, localStep: this service's transaction, compensateStep: compensation function, nextEventType: event to publish on success
OUTPUT: published event
BEGIN
IF event.type = "EXECUTE" THEN
result <- EXECUTE(localStep, event.data)
IF result is success THEN
PUBLISH_EVENT(nextEventType, result.data)
ELSE
PUBLISH_EVENT("COMPENSATION_NEEDED", event.data)
END IF
ELSE IF event.type = "COMPENSATION_NEEDED" THEN
EXECUTE(compensateStep, event.data)
PUBLISH_EVENT("COMPENSATED", event.data)
END IF
END
ALGORITHM IdempotentExecution(operationId, operation, idempotencyStore)
INPUT: operationId: unique ID for this operation, operation: the function to execute, idempotencyStore: persistent store of completed operations
OUTPUT: result
BEGIN
// Check if already executed
existing <- idempotencyStore.get(operationId)
IF existing is not NIL THEN
RETURN existing.result // return cached result
END IF
result <- EXECUTE(operation)
idempotencyStore.put(operationId, result)
RETURN result
END
Real-World Applications
- Banking wire transfers: traditional banks use 2PC (via the XA protocol) to coordinate debits and credits across different database systems, ensuring that money is never created or destroyed
- E-commerce order processing: platforms like Amazon use Sagas to coordinate order creation, payment, inventory reservation, and shipping; each service commits independently with compensations for failures
- Google Spanner: uses a variant of 2PC combined with Paxos for fault tolerance; the coordinator's decision is replicated via Paxos so that coordinator failure does not block participants
- Microservices choreography: event-driven architectures use Kafka or similar message brokers to implement choreography-based Sagas, where each service reacts to events and publishes its own
- Travel booking systems: booking a trip involves reserving a flight, hotel, and car rental across different providers; Sagas with compensations (cancel flight, cancel hotel) handle partial failures
- CockroachDB: implements distributed transactions using a parallel commit protocol that reduces the latency of 2PC by pipelining the prepare and commit phases
Key Takeaways
- Distributed transactions ensure atomicity across multiple nodes or services: either all participants commit or all abort
- Two-phase commit (2PC) provides strong atomicity but is a blocking protocol; if the coordinator crashes after participants vote "yes," participants are stuck holding locks until recovery
- Three-phase commit (3PC) reduces the blocking window by adding a pre-commit phase, but does not fully solve the problem under network partitions
- The Saga pattern replaces distributed locking with a sequence of local transactions and compensating actions, trading strong atomicity for availability and loose coupling
- Orchestrated Sagas use a central coordinator for easier reasoning; choreographed Sagas use events for looser coupling but are harder to debug
- Idempotency is essential for all distributed transaction patterns because network failures can cause retries, and each operation must be safe to execute multiple times