Randomized Algorithms and Probabilistic Analysis
Las Vegas vs Monte Carlo algorithms, expected running time, randomized quicksort, skip list analysis, and why adding randomness can make algorithms simpler and faster.
Terminology
| Term | Definition |
|---|---|
| Randomized algorithm | An algorithm that uses random coin flips (random bits) as part of its logic, making its behavior or output depend on randomness |
| Las Vegas algorithm | A randomized algorithm that always produces the correct answer but whose running time is a random variable; e.g. randomized quicksort |
| Monte Carlo algorithm | A randomized algorithm that runs in bounded time but may produce an incorrect answer with some probability; e.g. Miller-Rabin primality test |
| Expected running time | The average running time over all possible random choices the algorithm makes, for a fixed input |
| Indicator random variable | A random variable that is 1 if an event occurs and 0 otherwise; useful for computing expected values via linearity of expectation |
| Linearity of expectation | $E[X + Y] = E[X] + E[Y]$ regardless of whether $X$ and $Y$ are independent; a powerful tool for analyzing randomized algorithms |
| High probability bound | A guarantee that holds with probability $\ge 1 - 1/n^c$ for some constant $c$; stronger than expected-case analysis |
| Derandomization | Converting a randomized algorithm into a deterministic one, often at the cost of increased complexity or running time |
What & Why
Deterministic quicksort with a fixed pivot choice (say, always the first element) has O(n^2) worst case on already-sorted input. Randomized quicksort picks a random pivot, and its expected running time is O(n \log n) on every input. No adversary can construct a bad input because the algorithm's behavior depends on its own random choices, not the input structure.
Randomness gives algorithms two superpowers: avoiding worst-case inputs (no adversary can predict your random choices) and simplifying algorithm design (random sampling, random hashing, and random walks often lead to elegant solutions).
The distinction between Las Vegas and Monte Carlo is important. Las Vegas algorithms (randomized quicksort, randomized selection) always give the right answer but take variable time. Monte Carlo algorithms (Miller-Rabin, random sampling) run in fixed time but may be wrong with bounded probability. You can often boost Monte Carlo accuracy by repeating the algorithm.
How It Works
Randomized Quicksort
Instead of picking a fixed pivot, choose a uniformly random element. This ensures that, in expectation, the pivot splits the array reasonably well.
Expected comparisons: For an array of n elements, the expected number of comparisons is:
The analysis uses indicator random variables: let X_{ij} = 1 if elements i and j are ever compared. By linearity of expectation:
Skip List Analysis
A skip list is a randomized data structure where each element is promoted to higher levels with probability 1/2. The expected height is O(\log n), and search, insert, and delete all take O(\log n) expected time.
The key insight: the expected number of levels an element participates in is 1 + 1/2 + 1/4 + \ldots = 2. The expected maximum height across n elements is O(\log n) with high probability.
Boosting Monte Carlo Accuracy
If a Monte Carlo algorithm has error probability \le 1/3, running it k times and taking the majority answer reduces the error to \le e^{-k/6}. With k = O(\log(1/\delta)) repetitions, you achieve error probability \le \delta.
Complexity Analysis
| Algorithm | Type | Expected Time | Worst Case |
|---|---|---|---|
| Randomized Quicksort | Las Vegas | $O(n \log n)$ | $O(n^2)$ (extremely unlikely) |
| Randomized Select (median) | Las Vegas | $O(n)$ | $O(n^2)$ (extremely unlikely) |
| Skip List operations | Las Vegas | $O(\log n)$ | $O(n)$ (extremely unlikely) |
| Miller-Rabin (k rounds) | Monte Carlo | $O(k \cdot \log^2 n)$ | Same (fixed time) |
| Random sampling (median est.) | Monte Carlo | $O(n)$ | Same (fixed time) |
Implementation
ALGORITHM RandomizedQuicksort(array, low, high)
INPUT: array: list of comparable elements, low: start index, high: end index
OUTPUT: array sorted in place
BEGIN
IF low < high THEN
pivotIndex <- RANDOM_INT(low, high)
SWAP(array[pivotIndex], array[high])
pivot <- array[high]
i <- low - 1
FOR j <- low TO high - 1 DO
IF array[j] <= pivot THEN
i <- i + 1
SWAP(array[i], array[j])
END IF
END FOR
SWAP(array[i + 1], array[high])
p <- i + 1
RandomizedQuicksort(array, low, p - 1)
RandomizedQuicksort(array, p + 1, high)
END IF
END
ALGORITHM MillerRabin(n, k)
INPUT: n: integer to test for primality, k: number of rounds
OUTPUT: "probably prime" or "composite"
BEGIN
IF n < 2 THEN RETURN "composite" END IF
IF n = 2 OR n = 3 THEN RETURN "probably prime" END IF
IF n MOD 2 = 0 THEN RETURN "composite" END IF
Write n - 1 as 2^r * d where d is odd
FOR i <- 1 TO k DO
a <- RANDOM_INT(2, n - 2)
x <- MODULAR_POWER(a, d, n)
IF x = 1 OR x = n - 1 THEN
CONTINUE
END IF
FOR j <- 1 TO r - 1 DO
x <- (x * x) MOD n
IF x = n - 1 THEN
BREAK to next i
END IF
END FOR
RETURN "composite"
END FOR
RETURN "probably prime"
END
ALGORITHM RandomizedSelect(array, low, high, k)
INPUT: array: list, low/high: bounds, k: rank of desired element
OUTPUT: the k-th smallest element
BEGIN
IF low = high THEN
RETURN array[low]
END IF
pivotIndex <- RANDOM_INT(low, high)
p <- Partition(array, low, high, pivotIndex)
rank <- p - low + 1
IF k = rank THEN
RETURN array[p]
ELSE IF k < rank THEN
RETURN RandomizedSelect(array, low, p - 1, k)
ELSE
RETURN RandomizedSelect(array, p + 1, high, k - rank)
END IF
END
Real-World Applications
- Cryptographic key generation: RSA key generation uses Miller-Rabin to find large probable primes; the probability of a composite slipping through $k$ rounds is at most $4^{-k}$
- Load balancing: the "power of two choices" paradigm randomly picks two servers and assigns the job to the less loaded one, achieving exponentially better balance than single random choice
- Hash functions: universal hashing picks a random hash function from a family, guaranteeing expected $O(1)$ operations regardless of the input distribution
- Distributed consensus: randomized protocols like Ben-Or's consensus algorithm use coin flips to break symmetry and achieve agreement in asynchronous networks
- Machine learning: stochastic gradient descent randomly samples mini-batches, trading exact gradient computation for much faster convergence in practice
Key Takeaways
- Randomized algorithms use random bits to avoid worst-case inputs (Las Vegas) or trade correctness for speed (Monte Carlo)
- Randomized quicksort achieves $O(n \log n)$ expected time on every input, with no adversarial worst case
- Linearity of expectation is the key analysis tool: $E[X + Y] = E[X] + E[Y]$ even when $X$ and $Y$ are dependent
- Monte Carlo error can be reduced exponentially by repetition and majority voting
- Randomness is not just a theoretical trick: hash tables, skip lists, primality testing, and SGD all rely on randomization for practical performance