Back to Blog

Randomized Algorithms and Probabilistic Analysis

Las Vegas vs Monte Carlo algorithms, expected running time, randomized quicksort, skip list analysis, and why adding randomness can make algorithms simpler and faster.

2025-10-24
Share
Theory of Computationrandomized-algorithmsprobabilistic-analysis

Terminology

Term Definition
Randomized algorithm An algorithm that uses random coin flips (random bits) as part of its logic, making its behavior or output depend on randomness
Las Vegas algorithm A randomized algorithm that always produces the correct answer but whose running time is a random variable; e.g. randomized quicksort
Monte Carlo algorithm A randomized algorithm that runs in bounded time but may produce an incorrect answer with some probability; e.g. Miller-Rabin primality test
Expected running time The average running time over all possible random choices the algorithm makes, for a fixed input
Indicator random variable A random variable that is 1 if an event occurs and 0 otherwise; useful for computing expected values via linearity of expectation
Linearity of expectation $E[X + Y] = E[X] + E[Y]$ regardless of whether $X$ and $Y$ are independent; a powerful tool for analyzing randomized algorithms
High probability bound A guarantee that holds with probability $\ge 1 - 1/n^c$ for some constant $c$; stronger than expected-case analysis
Derandomization Converting a randomized algorithm into a deterministic one, often at the cost of increased complexity or running time

What & Why

Deterministic quicksort with a fixed pivot choice (say, always the first element) has O(n^2) worst case on already-sorted input. Randomized quicksort picks a random pivot, and its expected running time is O(n \log n) on every input. No adversary can construct a bad input because the algorithm's behavior depends on its own random choices, not the input structure.

Randomness gives algorithms two superpowers: avoiding worst-case inputs (no adversary can predict your random choices) and simplifying algorithm design (random sampling, random hashing, and random walks often lead to elegant solutions).

The distinction between Las Vegas and Monte Carlo is important. Las Vegas algorithms (randomized quicksort, randomized selection) always give the right answer but take variable time. Monte Carlo algorithms (Miller-Rabin, random sampling) run in fixed time but may be wrong with bounded probability. You can often boost Monte Carlo accuracy by repeating the algorithm.

How It Works

Randomized Quicksort

Instead of picking a fixed pivot, choose a uniformly random element. This ensures that, in expectation, the pivot splits the array reasonably well.

Expected comparisons: For an array of n elements, the expected number of comparisons is:

$E[\text{comparisons}] = 2n \ln n + O(n) \approx 1.39 \, n \log_2 n$

The analysis uses indicator random variables: let X_{ij} = 1 if elements i and j are ever compared. By linearity of expectation:

$E\left[\sum_{i < j} X_{ij}\right] = \sum_{i < j} \Pr[i \text{ and } j \text{ are compared}] = \sum_{i < j} \frac{2}{j - i + 1}$
Las Vegas vs Monte Carlo Las Vegas Always correct Random running time Randomized Quicksort Randomized Selection Skip Lists Monte Carlo Fixed running time May be incorrect (bounded prob.) Miller-Rabin Primality Bloom Filters Random Sampling

Skip List Analysis

A skip list is a randomized data structure where each element is promoted to higher levels with probability 1/2. The expected height is O(\log n), and search, insert, and delete all take O(\log n) expected time.

The key insight: the expected number of levels an element participates in is 1 + 1/2 + 1/4 + \ldots = 2. The expected maximum height across n elements is O(\log n) with high probability.

Boosting Monte Carlo Accuracy

If a Monte Carlo algorithm has error probability \le 1/3, running it k times and taking the majority answer reduces the error to \le e^{-k/6}. With k = O(\log(1/\delta)) repetitions, you achieve error probability \le \delta.

Error Reduction by Repetition 1 trial err: 33% 10 trials err: 1.6% 100 trials err: ~0% Majority vote = correct Error drops exponentially: $\Pr[\text{error}] \le e^{-k/6}$ after $k$ independent trials

Complexity Analysis

Algorithm Type Expected Time Worst Case
Randomized Quicksort Las Vegas $O(n \log n)$ $O(n^2)$ (extremely unlikely)
Randomized Select (median) Las Vegas $O(n)$ $O(n^2)$ (extremely unlikely)
Skip List operations Las Vegas $O(\log n)$ $O(n)$ (extremely unlikely)
Miller-Rabin (k rounds) Monte Carlo $O(k \cdot \log^2 n)$ Same (fixed time)
Random sampling (median est.) Monte Carlo $O(n)$ Same (fixed time)

Implementation

ALGORITHM RandomizedQuicksort(array, low, high)
INPUT: array: list of comparable elements, low: start index, high: end index
OUTPUT: array sorted in place
BEGIN
  IF low < high THEN
    pivotIndex <- RANDOM_INT(low, high)
    SWAP(array[pivotIndex], array[high])
    pivot <- array[high]
    i <- low - 1

    FOR j <- low TO high - 1 DO
      IF array[j] <= pivot THEN
        i <- i + 1
        SWAP(array[i], array[j])
      END IF
    END FOR

    SWAP(array[i + 1], array[high])
    p <- i + 1

    RandomizedQuicksort(array, low, p - 1)
    RandomizedQuicksort(array, p + 1, high)
  END IF
END

ALGORITHM MillerRabin(n, k)
INPUT: n: integer to test for primality, k: number of rounds
OUTPUT: "probably prime" or "composite"
BEGIN
  IF n < 2 THEN RETURN "composite" END IF
  IF n = 2 OR n = 3 THEN RETURN "probably prime" END IF
  IF n MOD 2 = 0 THEN RETURN "composite" END IF

  Write n - 1 as 2^r * d where d is odd

  FOR i <- 1 TO k DO
    a <- RANDOM_INT(2, n - 2)
    x <- MODULAR_POWER(a, d, n)

    IF x = 1 OR x = n - 1 THEN
      CONTINUE
    END IF

    FOR j <- 1 TO r - 1 DO
      x <- (x * x) MOD n
      IF x = n - 1 THEN
        BREAK to next i
      END IF
    END FOR

    RETURN "composite"
  END FOR

  RETURN "probably prime"
END

ALGORITHM RandomizedSelect(array, low, high, k)
INPUT: array: list, low/high: bounds, k: rank of desired element
OUTPUT: the k-th smallest element
BEGIN
  IF low = high THEN
    RETURN array[low]
  END IF

  pivotIndex <- RANDOM_INT(low, high)
  p <- Partition(array, low, high, pivotIndex)
  rank <- p - low + 1

  IF k = rank THEN
    RETURN array[p]
  ELSE IF k < rank THEN
    RETURN RandomizedSelect(array, low, p - 1, k)
  ELSE
    RETURN RandomizedSelect(array, p + 1, high, k - rank)
  END IF
END

Real-World Applications

  • Cryptographic key generation: RSA key generation uses Miller-Rabin to find large probable primes; the probability of a composite slipping through $k$ rounds is at most $4^{-k}$
  • Load balancing: the "power of two choices" paradigm randomly picks two servers and assigns the job to the less loaded one, achieving exponentially better balance than single random choice
  • Hash functions: universal hashing picks a random hash function from a family, guaranteeing expected $O(1)$ operations regardless of the input distribution
  • Distributed consensus: randomized protocols like Ben-Or's consensus algorithm use coin flips to break symmetry and achieve agreement in asynchronous networks
  • Machine learning: stochastic gradient descent randomly samples mini-batches, trading exact gradient computation for much faster convergence in practice

Key Takeaways

  • Randomized algorithms use random bits to avoid worst-case inputs (Las Vegas) or trade correctness for speed (Monte Carlo)
  • Randomized quicksort achieves $O(n \log n)$ expected time on every input, with no adversarial worst case
  • Linearity of expectation is the key analysis tool: $E[X + Y] = E[X] + E[Y]$ even when $X$ and $Y$ are dependent
  • Monte Carlo error can be reduced exponentially by repetition and majority voting
  • Randomness is not just a theoretical trick: hash tables, skip lists, primality testing, and SGD all rely on randomization for practical performance