Notes/Randomized Algorithms and Probabilistic Analysis

Randomized Algorithms and Probabilistic Analysis

Las Vegas vs Monte Carlo algorithms, expected running time, randomized quicksort, skip list analysis, and why adding randomness can make algorithms simpler and faster.

2025-10-24AI-Synthesized from Personal Notes

Theory of ComputationRandomized AlgorithmsProbabilistic Analysis

Terminology

Term	Definition
Randomized algorithm	An algorithm that uses random coin flips (random bits) as part of its logic, making its behavior or output depend on randomness
Las Vegas algorithm	A randomized algorithm that always produces the correct answer but whose running time is a random variable; e.g. randomized quicksort
Monte Carlo algorithm	A randomized algorithm that runs in bounded time but may produce an incorrect answer with some probability; e.g. Miller-Rabin primality test
Expected running time	The average running time over all possible random choices the algorithm makes, for a fixed input
Indicator random variable	A random variable that is 1 if an event occurs and 0 otherwise; useful for computing expected values via linearity of expectation
Linearity of expectation	$E[X + Y] = E[X] + E[Y]$ regardless of whether $X$ and $Y$ are independent; a powerful tool for analyzing randomized algorithms
High probability bound	A guarantee that holds with probability $\ge 1 - 1/n^c$ for some constant $c$; stronger than expected-case analysis
Derandomization	Converting a randomized algorithm into a deterministic one, often at the cost of increased complexity or running time

What & Why

Deterministic quicksort with a fixed pivot choice (say, always the first element) has $O(n^2)$ worst case on already-sorted input. Randomized quicksort picks a random pivot, and its expected running time is $O(n \log n)$ on every input. No adversary can construct a bad input because the algorithm's behavior depends on its own random choices, not the input structure.

Randomness gives algorithms two superpowers: avoiding worst-case inputs (no adversary can predict your random choices) and simplifying algorithm design (random sampling, random hashing, and random walks often lead to elegant solutions).

The distinction between Las Vegas and Monte Carlo is important. Las Vegas algorithms (randomized quicksort, randomized selection) always give the right answer but take variable time. Monte Carlo algorithms (Miller-Rabin, random sampling) run in fixed time but may be wrong with bounded probability. You can often boost Monte Carlo accuracy by repeating the algorithm.

How It Works

Randomized Quicksort

Instead of picking a fixed pivot, choose a uniformly random element. This ensures that, in expectation, the pivot splits the array reasonably well.

Expected comparisons: For an array of $n$ elements, the expected number of comparisons is:

$E[\text{comparisons}] = 2n \ln n + O(n) \approx 1.39 \, n \log_2 n$

The analysis uses indicator random variables: let $X_{ij} = 1$ if elements $i$ and $j$ are ever compared. By linearity of expectation:

$E\left[\sum_{i < j} X_{ij}\right] = \sum_{i < j} \Pr[i \text{ and } j \text{ are compared}] = \sum_{i < j} \frac{2}{j - i + 1}$

Skip List Analysis

A skip list is a randomized data structure where each element is promoted to higher levels with probability $1/2$. The expected height is $O(\log n)$, and search, insert, and delete all take $O(\log n)$ expected time.

The key insight: the expected number of levels an element participates in is $1 + 1/2 + 1/4 + \ldots = 2$. The expected maximum height across $n$ elements is $O(\log n)$ with high probability.

Boosting Monte Carlo Accuracy

If a Monte Carlo algorithm has error probability $\le 1/3$, running it $k$ times and taking the majority answer reduces the error to $\le e^{-k/6}$. With $k = O(\log(1/\delta))$ repetitions, you achieve error probability $\le \delta$.

Complexity Analysis

Algorithm	Type	Expected Time	Worst Case
Randomized Quicksort	Las Vegas	$O(n \log n)$	$O(n^2)$ (extremely unlikely)
Randomized Select (median)	Las Vegas	$O(n)$	$O(n^2)$ (extremely unlikely)
Skip List operations	Las Vegas	$O(\log n)$	$O(n)$ (extremely unlikely)
Miller-Rabin (k rounds)	Monte Carlo	$O(k \cdot \log^2 n)$	Same (fixed time)
Random sampling (median est.)	Monte Carlo	$O(n)$	Same (fixed time)

Implementation

ALGORITHM RandomizedQuicksort(array, low, high)
INPUT: array: list of comparable elements, low: start index, high: end index
OUTPUT: array sorted in place
BEGIN
  IF low < high THEN
    pivotIndex <- RANDOM_INT(low, high)
    SWAP(array[pivotIndex], array[high])
    pivot <- array[high]
    i <- low - 1

    FOR j <- low TO high - 1 DO
      IF array[j] <= pivot THEN
        i <- i + 1
        SWAP(array[i], array[j])
      END IF
    END FOR

    SWAP(array[i + 1], array[high])
    p <- i + 1

    RandomizedQuicksort(array, low, p - 1)
    RandomizedQuicksort(array, p + 1, high)
  END IF
END

ALGORITHM MillerRabin(n, k)
INPUT: n: integer to test for primality, k: number of rounds
OUTPUT: "probably prime" or "composite"
BEGIN
  IF n < 2 THEN RETURN "composite" END IF
  IF n = 2 OR n = 3 THEN RETURN "probably prime" END IF
  IF n MOD 2 = 0 THEN RETURN "composite" END IF

  Write n - 1 as 2^r * d where d is odd

  FOR i <- 1 TO k DO
    a <- RANDOM_INT(2, n - 2)
    x <- MODULAR_POWER(a, d, n)

    IF x = 1 OR x = n - 1 THEN
      CONTINUE
    END IF

    FOR j <- 1 TO r - 1 DO
      x <- (x * x) MOD n
      IF x = n - 1 THEN
        BREAK to next i
      END IF
    END FOR

    RETURN "composite"
  END FOR

  RETURN "probably prime"
END

ALGORITHM RandomizedSelect(array, low, high, k)
INPUT: array: list, low/high: bounds, k: rank of desired element
OUTPUT: the k-th smallest element
BEGIN
  IF low = high THEN
    RETURN array[low]
  END IF

  pivotIndex <- RANDOM_INT(low, high)
  p <- Partition(array, low, high, pivotIndex)
  rank <- p - low + 1

  IF k = rank THEN
    RETURN array[p]
  ELSE IF k < rank THEN
    RETURN RandomizedSelect(array, low, p - 1, k)
  ELSE
    RETURN RandomizedSelect(array, p + 1, high, k - rank)
  END IF
END

Real-World Applications

Cryptographic key generation: RSA key generation uses Miller-Rabin to find large probable primes; the probability of a composite slipping through $k$ rounds is at most $4^{-k}$
Load balancing: the "power of two choices" paradigm randomly picks two servers and assigns the job to the less loaded one, achieving exponentially better balance than single random choice
Hash functions: universal hashing picks a random hash function from a family, guaranteeing expected $O(1)$ operations regardless of the input distribution
Distributed consensus: randomized protocols like Ben-Or's consensus algorithm use coin flips to break symmetry and achieve agreement in asynchronous networks
Machine learning: stochastic gradient descent randomly samples mini-batches, trading exact gradient computation for much faster convergence in practice

Key Takeaways

Randomized algorithms use random bits to avoid worst-case inputs (Las Vegas) or trade correctness for speed (Monte Carlo)
Randomized quicksort achieves $O(n \log n)$ expected time on every input, with no adversarial worst case
Linearity of expectation is the key analysis tool: $E[X + Y] = E[X] + E[Y]$ even when $X$ and $Y$ are dependent
Monte Carlo error can be reduced exponentially by repetition and majority voting
Randomness is not just a theoretical trick: hash tables, skip lists, primality testing, and SGD all rely on randomization for practical performance

2025-10-25

Information Theory Basics: Entropy, Compression, and Shannon's Theorem

Shannon entropy, the source coding theorem, compression limits, Huffman coding, and why you can never compress truly random data.

2025-10-15

Big-O, Big-Theta, Big-Omega: Formal Asymptotic Analysis

Formal definitions of Big-O, Big-Theta, and Big-Omega notation with limit-based proofs, common growth rates, and why Big-O dominates interviews while Big-Theta is more precise.

2025-10-16

Recurrence Relations and the Master Theorem

How to solve recurrence relations using the Master Theorem, recursion tree method, and substitution method, with a detailed walkthrough of T(n) = 2T(n/2) + n.