Amortized Analysis: The True Cost of Operations
Understanding amortized analysis through the aggregate method, accounting method, and potential method, with dynamic array and splay tree examples showing why worst-case per-operation analysis can be misleading.
Terminology
| Term | Definition |
|---|---|
| Amortized analysis | A technique for analyzing the average cost per operation over a worst-case sequence of operations, giving a tighter bound than worst-case per-operation analysis |
| Aggregate method | Compute the total cost of $n$ operations, then divide by $n$ to get the amortized cost per operation |
| Accounting method | Assign each operation an amortized cost (possibly different from actual cost); cheap operations "save" credit that expensive operations "spend" |
| Potential method | Define a potential function $\Phi$ on the data structure's state; the amortized cost is the actual cost plus the change in potential |
| Potential function ($\Phi$) | A function mapping data structure states to non-negative real numbers, representing "stored energy" available for future expensive operations |
| Dynamic array | An array that doubles its capacity when full; individual inserts are $O(n)$ worst case but $O(1)$ amortized |
| Splay tree | A self-adjusting BST that moves accessed nodes to the root via rotations; $O(\log n)$ amortized per operation despite $O(n)$ worst case |
| Credit (accounting) | Prepaid work stored during cheap operations that pays for future expensive operations; total credit must never go negative |
What & Why
Worst-case analysis can be misleading. A dynamic array's push operation is O(n) in the worst case (when it needs to resize), but most pushes are O(1). If you analyze a sequence of n pushes using worst-case per-operation analysis, you get O(n^2). But the actual total cost is O(n), giving an amortized cost of O(1) per push.
Amortized analysis gives you the true cost of a sequence of operations. It does not use probability or averages over random inputs. It guarantees that for any sequence of n operations, the total cost is bounded, and the per-operation average of that bound is the amortized cost.
This matters whenever you use data structures with occasional expensive operations: dynamic arrays, hash tables with rehashing, splay trees, Fibonacci heaps, or union-find with path compression. The amortized cost tells you the real performance story.
How It Works
Method 1: Aggregate Method
Count the total cost of n operations, then divide by n.
Dynamic array example: Start with capacity 1. When full, double the capacity and copy all elements.
For n pushes, resizing happens at pushes 1, 2, 4, 8, 16, ..., costing 1 + 2 + 4 + 8 + ... + n copies. This geometric series sums to at most 2n. Adding the n individual push costs: total \le 3n. Amortized cost: 3n / n = O(1).
Method 2: Accounting Method
Assign each operation an amortized cost. Cheap operations pay more than their actual cost, banking credit. Expensive operations use that credit.
Dynamic array: Charge each push an amortized cost of 3 units:
- 1 unit for the push itself
- 2 units saved as credit (1 for copying this element later, 1 for copying a matching old element)
When a resize happens at capacity k, we need to copy k elements. But we have k/2 elements that each saved 2 credits (they were inserted since the last resize), giving k credits total. The resize is fully paid for.
Method 3: Potential Method
Define a potential function \Phi(D_i) on the state of the data structure after operation i. The amortized cost of operation i is:
The total amortized cost telescopes:
If \Phi(D_n) \ge \Phi(D_0) (typically both start at 0), then the total amortized cost is an upper bound on the total actual cost.
Dynamic array potential: Let \Phi = 2 \cdot \text{size} - \text{capacity}.
- After a push without resize:
\Phiincreases by 2 (size goes up by 1). Amortized cost:1 + 2 = 3. - After a push with resize from capacity
kto2k: actual cost isk + 1(copykelements + 1 push).\Phichanges from2k - k = kto2(k+1) - 2k = 2. Change in\Phi:2 - k. Amortized cost:(k+1) + (2 - k) = 3.
Both cases give amortized cost 3. Consistent with the other methods.
Complexity Analysis
| Data Structure | Operation | Worst Case | Amortized |
|---|---|---|---|
| Dynamic array | Push | $O(n)$ | $O(1)$ |
| Hash table (with rehash) | Insert | $O(n)$ | $O(1)$ |
| Splay tree | Search/Insert/Delete | $O(n)$ | $O(\log n)$ |
| Fibonacci heap | Decrease-key | $O(n)$ | $O(1)$ |
| Union-Find (path compression) | Find/Union | $O(\log n)$ | $O(\alpha(n)) \approx O(1)$ |
Implementation
ALGORITHM DynamicArrayPush(array, element)
INPUT: array: dynamic array with size and capacity, element: value to add
OUTPUT: updated array
BEGIN
IF array.size = array.capacity THEN
newCapacity <- array.capacity * 2
newData <- ALLOCATE array of size newCapacity
FOR i <- 0 TO array.size - 1 DO
newData[i] <- array.data[i]
END FOR
array.data <- newData
array.capacity <- newCapacity
END IF
array.data[array.size] <- element
array.size <- array.size + 1
END
ALGORITHM AmortizedCostAggregate(operations, n)
INPUT: operations: sequence of n operations, n: count
OUTPUT: amortized cost per operation
BEGIN
totalCost <- 0
FOR i <- 1 TO n DO
totalCost <- totalCost + ActualCost(operations[i])
END FOR
RETURN totalCost / n
END
ALGORITHM PotentialMethodCost(actualCost, phiBefore, phiAfter)
INPUT: actualCost: real cost of operation, phiBefore: potential before, phiAfter: potential after
OUTPUT: amortized cost of this operation
BEGIN
RETURN actualCost + (phiAfter - phiBefore)
END
ALGORITHM MultiPopStack_Amortized(stack, k)
INPUT: stack: stack with n elements, k: number to pop
OUTPUT: popped elements
BEGIN
popped <- empty list
count <- MIN(k, stack.size)
FOR i <- 1 TO count DO
popped <- popped + [POP(stack)]
END FOR
RETURN popped
END
Real-World Applications
- Dynamic arrays everywhere: JavaScript arrays, Python lists, Java ArrayLists, and C++ vectors all use doubling strategies with $O(1)$ amortized push, justified by amortized analysis
- Hash table rehashing: when a hash table's load factor exceeds a threshold, it rehashes all entries into a larger table; amortized analysis proves this keeps insert at $O(1)$ average
- Splay trees in caching: splay trees provide $O(\log n)$ amortized access and naturally move frequently accessed items to the root, making them effective for cache implementations
- Garbage collection: generational garbage collectors amortize the cost of collection over many allocations; most collections are cheap (young generation), with rare expensive full collections
- Network buffer management: network stacks use amortized analysis to justify buffer allocation strategies where occasional large allocations are spread across many small packet arrivals
Key Takeaways
- Amortized analysis gives the true average cost per operation over a worst-case sequence, not a probabilistic average
- Three methods (aggregate, accounting, potential) always give the same answer; choose whichever is most natural for the problem
- The potential method is the most general: define $\Phi$ so that cheap operations increase potential and expensive operations decrease it
- Dynamic array push is the canonical example: $O(n)$ worst case but $O(1)$ amortized, because resizes are exponentially rare
- Amortized analysis is essential for understanding the real performance of hash tables, splay trees, Fibonacci heaps, and union-find