Amortized Analysis: The True Cost of Operations
Understanding amortized analysis through the aggregate method, accounting method, and potential method, with dynamic array and splay tree examples showing why worst-case per-operation analysis can be misleading.
Terminology
| Term | Definition |
|---|---|
| Amortized analysis | A technique for analyzing the average cost per operation over a worst-case sequence of operations, giving a tighter bound than worst-case per-operation analysis |
| Aggregate method | Compute the total cost of $n$ operations, then divide by $n$ to get the amortized cost per operation |
| Accounting method | Assign each operation an amortized cost (possibly different from actual cost); cheap operations "save" credit that expensive operations "spend" |
| Potential method | Define a potential function $\Phi$ on the data structure's state; the amortized cost is the actual cost plus the change in potential |
| Potential function ($\Phi$) | A function mapping data structure states to non-negative real numbers, representing "stored energy" available for future expensive operations |
| Dynamic array | An array that doubles its capacity when full; individual inserts are $O(n)$ worst case but $O(1)$ amortized |
| Splay tree | A self-adjusting BST that moves accessed nodes to the root via rotations; $O(\log n)$ amortized per operation despite $O(n)$ worst case |
| Credit (accounting) | Prepaid work stored during cheap operations that pays for future expensive operations; total credit must never go negative |
What & Why
Worst-case analysis can be misleading. A dynamic array's push operation is $O(n)$ in the worst case (when it needs to resize), but most pushes are $O(1)$. If you analyze a sequence of $n$ pushes using worst-case per-operation analysis, you get $O(n^2)$. But the actual total cost is $O(n)$, giving an amortized cost of $O(1)$ per push.
Amortized analysis gives you the true cost of a sequence of operations. It does not use probability or averages over random inputs. It guarantees that for any sequence of $n$ operations, the total cost is bounded, and the per-operation average of that bound is the amortized cost.
This matters whenever you use data structures with occasional expensive operations: dynamic arrays, hash tables with rehashing, splay trees, Fibonacci heaps, or union-find with path compression. The amortized cost tells you the real performance story.
How It Works
Method 1: Aggregate Method
Count the total cost of $n$ operations, then divide by $n$.
Dynamic array example: Start with capacity 1. When full, double the capacity and copy all elements.
For $n$ pushes, resizing happens at pushes 1, 2, 4, 8, 16, ..., costing 1 + 2 + 4 + 8 + ... + $n$ copies. This geometric series sums to at most $2n$. Adding the $n$ individual push costs: total $\le 3n$. Amortized cost: $3n / n = O(1)$.
Method 2: Accounting Method
Assign each operation an amortized cost. Cheap operations pay more than their actual cost, banking credit. Expensive operations use that credit.
Dynamic array: Charge each push an amortized cost of 3 units:
- 1 unit for the push itself
- 2 units saved as credit (1 for copying this element later, 1 for copying a matching old element)
When a resize happens at capacity $k$, we need to copy $k$ elements. But we have $k/2$ elements that each saved 2 credits (they were inserted since the last resize), giving $k$ credits total. The resize is fully paid for.
Method 3: Potential Method
Define a potential function $\Phi(D_i)$ on the state of the data structure after operation $i$. The amortized cost of operation $i$ is:
The total amortized cost telescopes:
If $\Phi(D_n) \ge \Phi(D_0)$ (typically both start at 0), then the total amortized cost is an upper bound on the total actual cost.
Dynamic array potential: Let $\Phi = 2 \cdot \text{size} - \text{capacity}$.
- After a push without resize: $\Phi$ increases by 2 (size goes up by 1). Amortized cost: $1 + 2 = 3$.
- After a push with resize from capacity $k$ to $2k$: actual cost is $k + 1$ (copy $k$ elements + 1 push). $\Phi$ changes from $2k - k = k$ to $2(k+1) - 2k = 2$. Change in $\Phi$: $2 - k$. Amortized cost: $(k+1) + (2 - k) = 3$.
Both cases give amortized cost 3. Consistent with the other methods.
Complexity Analysis
| Data Structure | Operation | Worst Case | Amortized |
|---|---|---|---|
| Dynamic array | Push | $O(n)$ | $O(1)$ |
| Hash table (with rehash) | Insert | $O(n)$ | $O(1)$ |
| Splay tree | Search/Insert/Delete | $O(n)$ | $O(\log n)$ |
| Fibonacci heap | Decrease-key | $O(n)$ | $O(1)$ |
| Union-Find (path compression) | Find/Union | $O(\log n)$ | $O(\alpha(n)) \approx O(1)$ |
Implementation
ALGORITHM DynamicArrayPush(array, element)
INPUT: array: dynamic array with size and capacity, element: value to add
OUTPUT: updated array
BEGIN
IF array.size = array.capacity THEN
newCapacity <- array.capacity * 2
newData <- ALLOCATE array of size newCapacity
FOR i <- 0 TO array.size - 1 DO
newData[i] <- array.data[i]
END FOR
array.data <- newData
array.capacity <- newCapacity
END IF
array.data[array.size] <- element
array.size <- array.size + 1
END
ALGORITHM AmortizedCostAggregate(operations, n)
INPUT: operations: sequence of n operations, n: count
OUTPUT: amortized cost per operation
BEGIN
totalCost <- 0
FOR i <- 1 TO n DO
totalCost <- totalCost + ActualCost(operations[i])
END FOR
RETURN totalCost / n
END
ALGORITHM PotentialMethodCost(actualCost, phiBefore, phiAfter)
INPUT: actualCost: real cost of operation, phiBefore: potential before, phiAfter: potential after
OUTPUT: amortized cost of this operation
BEGIN
RETURN actualCost + (phiAfter - phiBefore)
END
ALGORITHM MultiPopStack_Amortized(stack, k)
INPUT: stack: stack with n elements, k: number to pop
OUTPUT: popped elements
BEGIN
popped <- empty list
count <- MIN(k, stack.size)
FOR i <- 1 TO count DO
popped <- popped + [POP(stack)]
END FOR
RETURN popped
END
Real-World Applications
- Dynamic arrays everywhere: JavaScript arrays, Python lists, Java ArrayLists, and C++ vectors all use doubling strategies with $O(1)$ amortized push, justified by amortized analysis
- Hash table rehashing: when a hash table's load factor exceeds a threshold, it rehashes all entries into a larger table; amortized analysis proves this keeps insert at $O(1)$ average
- Splay trees in caching: splay trees provide $O(\log n)$ amortized access and naturally move frequently accessed items to the root, making them effective for cache implementations
- Garbage collection: generational garbage collectors amortize the cost of collection over many allocations; most collections are cheap (young generation), with rare expensive full collections
- Network buffer management: network stacks use amortized analysis to justify buffer allocation strategies where occasional large allocations are spread across many small packet arrivals
Key Takeaways
- Amortized analysis gives the true average cost per operation over a worst-case sequence, not a probabilistic average
- Three methods (aggregate, accounting, potential) always give the same answer; choose whichever is most natural for the problem
- The potential method is the most general: define $\Phi$ so that cheap operations increase potential and expensive operations decrease it
- Dynamic array push is the canonical example: $O(n)$ worst case but $O(1)$ amortized, because resizes are exponentially rare
- Amortized analysis is essential for understanding the real performance of hash tables, splay trees, Fibonacci heaps, and union-find
Read More
2025-10-24
Randomized Algorithms and Probabilistic Analysis
Las Vegas vs Monte Carlo algorithms, expected running time, randomized quicksort, skip list analysis, and why adding randomness can make algorithms simpler and faster.
2025-10-25
Information Theory Basics: Entropy, Compression, and Shannon's Theorem
Shannon entropy, the source coding theorem, compression limits, Huffman coding, and why you can never compress truly random data.
2025-10-15
Big-O, Big-Theta, Big-Omega: Formal Asymptotic Analysis
Formal definitions of Big-O, Big-Theta, and Big-Omega notation with limit-based proofs, common growth rates, and why Big-O dominates interviews while Big-Theta is more precise.