Notes/Distributed Locks and Why They Are Hard

Distributed Locks and Why They Are Hard

How lease-based distributed locks work, why process pauses and clock skew make them dangerous, the Redis SETNX pattern, the Redlock algorithm and its controversy, and when to use distributed locks vs when to avoid them entirely.

2021-02-27AI-Synthesized from Personal Notes

Distributed SystemsDistributed LocksRedlockLeaseRedis

Terminology

What & Why

Why Single-Node Locks Break Down

On a single machine, a mutex or file lock works because the operating system enforces it. If a process holds a lock, no other process can acquire it, period. The OS kernel is the single authority.

In a distributed system, there is no single authority. The lock server might crash, the network might partition, or the client holding the lock might pause for garbage collection. Every assumption that makes single-node locks reliable falls apart when you add a network boundary.

The Lease Model

The standard solution to orphaned locks is the lease: a lock with a built-in expiration time (TTL). If the holder crashes, the lease eventually expires and another client can acquire it. This prevents deadlocks from crashed processes.

But leases introduce a new problem. What if the holder is still alive and working, but the lease expires before it finishes? Now two clients both believe they hold the lock.

The Process Pause Problem

The most dangerous failure mode is the process pause. A client acquires a lock, then experiences a long GC pause (or VM scheduling delay, or page fault storm). During the pause, the lease expires. Another client acquires the lock. The first client resumes, unaware that its lease expired, and writes to the shared resource. Both clients have now written, and data is corrupted.

This is not a theoretical concern. JVM garbage collection pauses of 10+ seconds are documented in production systems. Even non-GC languages can experience similar pauses from VM live migration or OS-level memory pressure.

How It Works

Redis SETNX: The Simple Approach

The simplest distributed lock uses a single Redis node. The client calls SET key value NX PX ttl which atomically sets the key only if it does not exist, with a TTL in milliseconds. To release, the client deletes the key (after verifying it still holds the lock via the value).

This works well for efficiency-type locks (where occasional double-execution is tolerable), but has a critical weakness: if the single Redis node crashes, the lock disappears entirely. There is no redundancy.

The Redlock Algorithm

Redlock addresses the single point of failure by using N independent Redis nodes (typically 5). The client tries to acquire the lock on all N nodes sequentially, and considers the lock acquired only if it succeeds on a majority ($\lceil N/2 \rceil + 1$, so 3 out of 5).

The critical detail is the validity time calculation. The lock is only valid for:

$T_{\text{valid}} = T_{\text{ttl}} - (T_{\text{acquire end}} - T_{\text{acquire start}}) - T_{\text{drift}}$

Where $T_{\text{drift}}$ accounts for clock skew across nodes. If the acquisition took too long (network delays, slow nodes), the remaining validity time may be too short to do useful work, and the client should release and retry.

Redlock Step-by-Step

Walking through a concrete Redlock acquisition with 5 nodes, where 2 nodes fail:

Lock Implementation Comparison

Four common approaches to distributed locking, each with different trade-offs in complexity, guarantees, and failure modes:

The Redlock Controversy

The Redlock algorithm sparked one of the most important debates in distributed systems. Martin Kleppmann argued that Redlock is fundamentally unsafe because it depends on timing assumptions that real systems cannot guarantee.

Kleppmann's core argument: even if Redlock acquires the lock correctly, a process pause after acquisition can outlast the lease. The lock expires, another client acquires it, and both clients proceed. The fix is not a better lock algorithm, but fencing tokens at the storage layer.

Lock Acquisition Latency

Stronger guarantees come at a latency cost. Single Redis is fastest but least safe. ZooKeeper and etcd provide stronger guarantees but require consensus rounds.

Distributed Lock Lifecycle

A distributed lock moves through several states with timing constraints at each transition. The key insight is that the "Held" state has a maximum duration equal to the remaining validity time, after which it transitions to "Expired" regardless of whether the holder is done.

Complexity Analysis

The cost of distributed locking depends on the implementation. More nodes and stronger consensus protocols increase latency and network overhead.

$\text{Quorum size} = \lceil N/2 \rceil + 1$

For Redlock with $N = 5$ nodes, the quorum is 3. The probability of failing to acquire the lock when $F$ nodes are down:

$P(\text{fail}) = P(F \geq \lceil N/2 \rceil) = \sum_{k=\lceil N/2 \rceil}^{N} \binom{N}{k} p^k (1-p)^{N-k}$

where $p$ is the probability of any single node being unavailable.

Implementation

Pseudocode for the Redlock algorithm. The key details are the timing checks and the release-on-failure behavior.

FUNCTION redlock_acquire(nodes[], key, ttl, drift_factor):
    start_time = NOW()
    successes = 0
    value = GENERATE_UNIQUE_ID()

    FOR EACH node IN nodes:
        IF SET(node, key, value, NX, PX=ttl) == OK:
            successes = successes + 1

    elapsed = NOW() - start_time
    drift = ttl * drift_factor + 2  // milliseconds
    validity_time = ttl - elapsed - drift

    IF successes >= CEIL(LEN(nodes) / 2) + 1 AND validity_time > 0:
        RETURN {acquired: true, validity: validity_time, value: value}
    ELSE:
        // Release all nodes where we acquired
        FOR EACH node IN nodes:
            IF GET(node, key) == value:
                DEL(node, key)
        RETURN {acquired: false}

FUNCTION redlock_release(nodes[], key, value):
    FOR EACH node IN nodes:
        // Only delete if we still hold the lock
        IF GET(node, key) == value:
            DEL(node, key)

FUNCTION use_lock_safely(nodes[], key, ttl):
    result = redlock_acquire(nodes, key, ttl, 0.01)
    IF NOT result.acquired:
        RETURN FAILURE("could not acquire lock")

    TRY:
        // Do work within validity_time
        do_critical_section()
    FINALLY:
        redlock_release(nodes, key, result.value)

Real-World Applications

Key Takeaways

2021-02-28

Fencing Tokens and Monotonic IDs: Making Distributed Locks Actually Safe

How fencing tokens prevent stale writes from expired leases, why the lock server alone cannot guarantee safety, how monotonically increasing tokens make the storage layer the final arbiter, and the connection to linearizability.

2021-02-19

CAP Theorem: The Fundamental Trade-Off in Distributed Systems

Understanding the CAP theorem and why every distributed system must choose between consistency, availability, and partition tolerance when network failures occur.

2021-02-20

Consensus Algorithms: Getting Distributed Nodes to Agree

How Paxos and Raft solve the consensus problem in distributed systems, enabling multiple nodes to agree on a single value even when some nodes fail.