Notes/Caching: Strategies, Eviction, and Invalidation

Caching: Strategies, Eviction, and Invalidation

How caches accelerate reads with strategies like LRU and LFU, how write-through and write-back policies keep data consistent, and why cache invalidation is one of the hardest problems in computing.

2021-02-13AI-Synthesized from Personal Notes

System DesignCaching

Terminology

Cache: a fast, temporary storage layer that holds copies of frequently accessed data so future requests can be served without hitting the slower origin (database, disk, remote API)
Cache hit: a request where the requested data is found in the cache, avoiding a trip to the origin
Cache miss: a request where the data is not in the cache, requiring a fetch from the origin
Hit rate: the fraction of total requests that result in cache hits, expressed as $\text{hits} / (\text{hits} + \text{misses})$
Eviction: the process of removing an entry from the cache to make room for new data when the cache is full
LRU (Least Recently Used): an eviction policy that removes the entry that has not been accessed for the longest time
LFU (Least Frequently Used): an eviction policy that removes the entry with the lowest access count
TTL (Time to Live): a duration after which a cached entry is considered stale and eligible for removal or refresh
Write-through: a write policy where every write updates both the cache and the origin synchronously before returning success
Write-back (write-behind): a write policy where writes update the cache immediately and flush to the origin asynchronously at a later time
Write-around: a write policy where writes go directly to the origin, bypassing the cache; the cache is only populated on reads
Cache invalidation: the process of removing or marking stale entries in the cache when the underlying data changes
Cache-aside (lazy loading): a pattern where the application checks the cache first, fetches from the origin on a miss, and then populates the cache
Read-through: a pattern where the cache itself fetches from the origin on a miss, transparent to the application
Thundering herd: a scenario where many concurrent requests for the same uncached key all miss simultaneously and flood the origin
Cold start: the period after a cache is deployed (or flushed) when the hit rate is low because no data has been loaded yet

What & Why

Databases are slow. Disk seeks take milliseconds. Network round trips to a remote datastore add more. When the same data is requested thousands of times per second, hitting the origin every time wastes resources and adds latency. A cache sits between the requester and the origin, holding copies of hot data in fast storage (usually memory) so repeated reads return in microseconds instead of milliseconds.

Caching is effective because of a property called locality of reference. In most systems, a small fraction of data accounts for the majority of reads. Social media timelines, product catalog pages, user sessions, and configuration values are all read far more often than they are written. A cache that holds just the hottest 1-5% of data can absorb 80-99% of read traffic.

The challenge is keeping the cache consistent with the origin. When the underlying data changes, the cache must be updated or invalidated. If it is not, clients see stale data. This is the cache invalidation problem, famously described as one of the two hard things in computer science. Different write policies (write-through, write-back, write-around) and invalidation strategies (TTL, event-driven purge, versioning) offer different trade-offs between consistency, latency, and complexity.

Eviction is the other core concern. Caches have limited capacity. When the cache is full and a new entry needs to be stored, something must be removed. The eviction policy determines which entry to discard. LRU (Least Recently Used) is the most common choice because it approximates optimal eviction well for most workloads. LFU (Least Frequently Used) works better when access frequency matters more than recency.

How It Works

Cache-Aside Pattern

The most common caching pattern. The application owns the interaction with both the cache and the origin.

Application receives a read request for key K.
Application checks the cache for K.
If K is in the cache (hit), return the cached value.
If K is not in the cache (miss), fetch from the origin, store the result in the cache with a TTL, and return it.

On writes, the application updates the origin and then either invalidates the cache entry (delete) or updates it.

Write Policies

Write-through writes to the cache and origin synchronously. Every write is durable immediately, and the cache is always consistent. The cost is higher write latency because both stores must acknowledge before the operation completes.

Write-back writes to the cache only and returns immediately. Dirty entries are flushed to the origin asynchronously (on eviction, periodically, or in batches). This gives the lowest write latency but risks data loss if the cache crashes before flushing.

Write-around bypasses the cache entirely on writes. Data goes straight to the origin. The cache is only populated on reads (cache-aside). This avoids polluting the cache with data that may never be read again, but the first read after a write always misses.

LRU Eviction

LRU tracks the order in which entries were last accessed. When the cache is full, the entry that has not been accessed for the longest time is evicted. The standard implementation uses a hash map for $O(1)$ lookups combined with a doubly linked list for $O(1)$ reordering. On every access, the entry is moved to the head of the list. On eviction, the tail entry is removed.

LFU Eviction

LFU tracks how many times each entry has been accessed. When the cache is full, the entry with the lowest access count is evicted. A naive implementation uses a min-heap, but the O(1) LFU variant uses a hash map plus a frequency-indexed doubly linked list structure. Each frequency bucket holds entries with that access count. On access, an entry moves from its current frequency bucket to the next one.

Complexity Analysis

Let $n$ be the number of entries in the cache and $c$ be the cache capacity.

Operation	LRU (HashMap + DLL)	LFU (O(1) variant)	FIFO
Get (hit)	$O(1)$	$O(1)$	$O(1)$
Put (no eviction)	$O(1)$	$O(1)$	$O(1)$
Put (with eviction)	$O(1)$	$O(1)$	$O(1)$
Delete / Invalidate	$O(1)$	$O(1)$	$O(1)$
Space	$O(c)$	$O(c)$	$O(c)$

The expected hit rate for a cache of capacity $c$ over a dataset of size $n$ depends on the access distribution. For a Zipf distribution with parameter $\alpha$, the theoretical hit rate is approximately:

$H(c, n, \alpha) \approx \frac{\sum_{i=1}^{c} i^{-\alpha}}{\sum_{i=1}^{n} i^{-\alpha}}$

For typical web workloads ($\alpha \approx 0.8$ to $1.2$), a cache holding 1% of the dataset can achieve 50-80% hit rates.

Implementation

ALGORITHM LRUCache
STRUCTURE:
  capacity: integer
  map: HashMap of key -> (value, listNode)
  list: DoublyLinkedList  // head = most recent, tail = least recent

FUNCTION GET(key)
INPUT: key to look up
OUTPUT: value if found, NIL if not found
BEGIN
  IF key NOT IN map THEN
    RETURN NIL
  END IF

  node <- map[key].listNode
  MOVE node TO HEAD of list
  RETURN map[key].value
END

FUNCTION PUT(key, value)
INPUT: key and value to store
BEGIN
  IF key IN map THEN
    map[key].value <- value
    MOVE map[key].listNode TO HEAD of list
    RETURN
  END IF

  IF SIZE(map) >= capacity THEN
    tailNode <- REMOVE TAIL of list
    DELETE tailNode.key FROM map
  END IF

  newNode <- CREATE list node with key
  INSERT newNode AT HEAD of list
  map[key] <- (value, newNode)
END

FUNCTION DELETE(key)
INPUT: key to remove
BEGIN
  IF key NOT IN map THEN
    RETURN
  END IF

  REMOVE map[key].listNode FROM list
  DELETE key FROM map
END


ALGORITHM LFUCache
STRUCTURE:
  capacity: integer
  minFreq: integer
  keyMap: HashMap of key -> (value, frequency)
  freqMap: HashMap of frequency -> DoublyLinkedList of keys

FUNCTION GET(key)
INPUT: key to look up
OUTPUT: value if found, NIL if not found
BEGIN
  IF key NOT IN keyMap THEN
    RETURN NIL
  END IF

  oldFreq <- keyMap[key].frequency
  keyMap[key].frequency <- oldFreq + 1

  REMOVE key FROM freqMap[oldFreq]
  IF freqMap[oldFreq] IS EMPTY THEN
    DELETE freqMap[oldFreq]
    IF minFreq = oldFreq THEN
      minFreq <- oldFreq + 1
    END IF
  END IF

  APPEND key TO freqMap[oldFreq + 1]
  RETURN keyMap[key].value
END

FUNCTION PUT(key, value)
INPUT: key and value to store
BEGIN
  IF capacity = 0 THEN RETURN END IF

  IF key IN keyMap THEN
    keyMap[key].value <- value
    GET(key)  // updates frequency
    RETURN
  END IF

  IF SIZE(keyMap) >= capacity THEN
    evictKey <- REMOVE TAIL FROM freqMap[minFreq]
    DELETE evictKey FROM keyMap
    IF freqMap[minFreq] IS EMPTY THEN
      DELETE freqMap[minFreq]
    END IF
  END IF

  keyMap[key] <- (value, 1)
  APPEND key TO freqMap[1]
  minFreq <- 1
END


ALGORITHM CacheAside
FUNCTION READ(key)
INPUT: key to look up
OUTPUT: value
BEGIN
  value <- CACHE.GET(key)
  IF value IS NOT NIL THEN
    RETURN value   // cache hit
  END IF

  value <- ORIGIN.FETCH(key)   // cache miss
  CACHE.PUT(key, value, TTL)
  RETURN value
END

FUNCTION WRITE(key, value)
INPUT: key and value to write
BEGIN
  ORIGIN.WRITE(key, value)
  CACHE.DELETE(key)   // invalidate
END

Real-World Applications

Web application caching: Redis and Memcached sit between application servers and databases, caching query results, session data, and computed values to reduce database load by 80-95%
CDN edge caches: content delivery networks cache static assets (images, CSS, JavaScript) at edge locations worldwide, serving content from the nearest cache instead of the origin server
CPU caches: L1, L2, and L3 caches on the processor use LRU-like eviction to keep frequently accessed memory lines close to the CPU, bridging the speed gap between registers and main memory
DNS caching: DNS resolvers cache domain-to-IP mappings with TTLs, preventing repeated lookups to authoritative nameservers for every request
Database buffer pools: databases like PostgreSQL and MySQL maintain an in-memory buffer pool of recently accessed disk pages, using clock-sweep (an LRU approximation) to decide which pages to evict
Browser caches: web browsers cache HTTP responses locally based on Cache-Control headers, avoiding redundant network requests for unchanged resources

Key Takeaways

Caching exploits locality of reference: a small cache holding hot data can absorb the majority of read traffic and dramatically reduce origin load
LRU eviction (hash map + doubly linked list) provides $O(1)$ operations and works well for most workloads; LFU is better when access frequency matters more than recency
Write-through guarantees consistency but adds write latency; write-back minimizes latency but risks data loss; write-around avoids cache pollution but causes misses after writes
Cache invalidation is the hard part: TTL-based expiry is simple but allows staleness windows, while event-driven invalidation is precise but adds system complexity
The thundering herd problem (many concurrent misses for the same key) can be mitigated with request coalescing, locking, or pre-warming strategies

2021-02-14

Databases: SQL vs NoSQL, Indexing, and Query Optimization

How relational and non-relational databases differ in data modeling and guarantees, how B-tree and hash indexes accelerate lookups, and how query optimizers turn SQL into efficient execution plans.

2021-02-15

Message Queues and Event-Driven Architecture

How message queues decouple producers from consumers, the differences between pub/sub and point-to-point messaging, and the trade-offs between at-least-once and exactly-once delivery guarantees.

2021-02-16

Microservices and API Gateway Patterns

How microservices decompose monoliths into independently deployable services, how API gateways and service meshes manage cross-cutting concerns, and when the Backend for Frontend pattern simplifies client integration.