Notes/Non-Comparison Sorting: Counting Sort, Radix Sort, and Bucket Sort

Non-Comparison Sorting: Counting Sort, Radix Sort, and Bucket Sort

Breaking the n log n barrier by exploiting structure in the data, with Counting Sort, Radix Sort, and Bucket Sort.

2021-04-02AI-Synthesized from Personal Notes

AlgorithmsSortingCounting SortRadix Sort

Terminology

Non-comparison sort: a sorting algorithm that does not rely on pairwise element comparisons. Instead, it uses properties of the data (like digit values or ranges) to place elements directly.
Stable sort: a sort that preserves the relative order of elements with equal keys. Stability is critical when Radix Sort uses Counting Sort as a subroutine.
Key range ($k$): the difference between the maximum and minimum possible values in the input. Counting Sort's performance depends on $k$.
Radix: the base of a number system (e.g., base 10, base 256). Radix Sort processes one digit at a time in a chosen radix.
LSD (Least Significant Digit): a Radix Sort variant that processes digits from right to left, relying on a stable subroutine sort.
MSD (Most Significant Digit): a Radix Sort variant that processes digits from left to right, recursively sorting each bucket.
Bucket: a container (often a list) that collects elements falling within a certain range or sharing a certain digit value.
Uniform distribution: a probability distribution where every value in a range is equally likely. Bucket Sort achieves $O(n)$ expected time under this assumption.

What & Why

Comparison-based sorts like Merge Sort and Quick Sort cannot beat $\Omega(n \log n)$ in the worst case. But that lower bound only applies when the algorithm learns about element order exclusively through comparisons. If we know something about the structure of the data, such as the range of values or the number of digits, we can sort faster.

Non-comparison sorts exploit this extra information to achieve linear or near-linear time:

Counting Sort: sorts integers in $O(n + k)$ time where $k$ is the value range
Radix Sort: sorts integers or strings in $O(d \cdot (n + k))$ time where $d$ is the number of digits
Bucket Sort: sorts uniformly distributed data in $O(n)$ expected time

The trade-off is that these algorithms have constraints on their input. Counting Sort needs a bounded integer range. Radix Sort needs fixed-length keys or bounded digit counts. Bucket Sort needs a known distribution. When those conditions hold, they outperform comparison-based sorts significantly.

How It Works

Counting Sort

Counting Sort works in three passes over the data:

Count: scan the input and count how many times each value appears
Accumulate: convert counts into starting positions using a prefix sum
Place: walk through the input again, placing each element at its computed position

Radix Sort (LSD)

Radix Sort processes digits from least significant to most significant. For each digit position, it uses a stable sort (typically Counting Sort) to rearrange elements by that digit. Because the subroutine is stable, earlier digit orderings are preserved when later digits are equal.

For example, sorting three-digit numbers: first sort by the ones digit, then by the tens digit, then by the hundreds digit. After all passes, the array is fully sorted.

Bucket Sort

Bucket Sort distributes elements into a fixed number of buckets based on their value range, sorts each bucket individually (often with Insertion Sort), then concatenates the buckets. When the input is uniformly distributed, each bucket gets roughly $O(1)$ elements, making the total work $O(n)$.

Complexity Analysis

Algorithm	Time	Space	Stable?
Counting Sort	$O(n + k)$	$O(n + k)$	Yes
Radix Sort	$O(d \cdot (n + k))$	$O(n + k)$	Yes
Bucket Sort	$O(n)$ expected	$O(n + b)$	Depends

Counting Sort Analysis

Counting Sort makes three linear passes: one over the input ($n$ elements), one over the count array ($k$ slots), and one more over the input for placement. Total:

$T(n, k) = O(n + k)$

When $k = O(n)$, this is linear. When $k \gg n$ (e.g., sorting 10 elements with values up to $10^9$), the count array dominates and Counting Sort becomes impractical.

Radix Sort Analysis

Radix Sort runs Counting Sort $d$ times (once per digit), where each digit has $k$ possible values:

$T(n, d, k) = O(d \cdot (n + k))$

For $w$-bit integers using radix $r = 2^b$, we have $d = w/b$ digits each with $k = 2^b$ values:

$T = O\!\left(\frac{w}{b} \cdot (n + 2^b)\right)$

Choosing $b = \log_2 n$ gives $d = w / \log n$ and $k = n$, yielding $O(wn / \log n)$. For 32-bit integers, this beats $O(n \log n)$ comparison sorts when $n$ is large.

Bucket Sort Expected Time

With $n$ elements distributed uniformly into $n$ buckets, the expected number of elements per bucket is 1. Sorting each bucket with Insertion Sort costs $O(m^2)$ for $m$ elements, but the expected total is:

$E\left[\sum_{i=0}^{n-1} O(m_i^2)\right] = O(n)$

In the worst case (all elements in one bucket), Bucket Sort degrades to $O(n^2)$.

Implementation

Counting Sort

ALGORITHM CountingSort(A, n, k)
  INPUT: array A of n non-negative integers, max value k
  OUTPUT: sorted array B

  CREATE count array C of size k + 1, initialized to 0
  CREATE output array B of size n

  // Step 1: Count occurrences
  FOR i = 0 TO n - 1 DO
    C[A[i]] = C[A[i]] + 1
  END FOR

  // Step 2: Prefix sum (cumulative count)
  FOR i = 1 TO k DO
    C[i] = C[i] + C[i - 1]
  END FOR

  // Step 3: Place elements (traverse backward for stability)
  FOR i = n - 1 DOWNTO 0 DO
    B[C[A[i]] - 1] = A[i]
    C[A[i]] = C[A[i]] - 1
  END FOR

  RETURN B
END

Radix Sort (LSD)

ALGORITHM RadixSort(A, n, d)
  INPUT: array A of n integers, d = number of digits
  OUTPUT: A sorted in ascending order

  FOR digit = 0 TO d - 1 DO
    // Use stable Counting Sort on the current digit
    StableCountingSortByDigit(A, n, digit)
  END FOR

  RETURN A
END

ALGORITHM StableCountingSortByDigit(A, n, digit)
  INPUT: array A, size n, digit position to sort by
  OUTPUT: A rearranged stably by the given digit

  k = 9  // for base-10 digits
  CREATE count array C of size k + 1, initialized to 0
  CREATE output array B of size n

  FOR i = 0 TO n - 1 DO
    d = GET_DIGIT(A[i], digit)
    C[d] = C[d] + 1
  END FOR

  FOR i = 1 TO k DO
    C[i] = C[i] + C[i - 1]
  END FOR

  FOR i = n - 1 DOWNTO 0 DO
    d = GET_DIGIT(A[i], digit)
    B[C[d] - 1] = A[i]
    C[d] = C[d] - 1
  END FOR

  COPY B into A
END

Bucket Sort

ALGORITHM BucketSort(A, n)
  INPUT: array A of n floats in range [0, 1)
  OUTPUT: A sorted in ascending order

  CREATE array of n empty lists (buckets)

  // Distribute elements into buckets
  FOR i = 0 TO n - 1 DO
    bucketIndex = FLOOR(A[i] * n)
    APPEND A[i] to buckets[bucketIndex]
  END FOR

  // Sort each bucket (Insertion Sort for small lists)
  FOR i = 0 TO n - 1 DO
    InsertionSort(buckets[i])
  END FOR

  // Concatenate all buckets
  index = 0
  FOR i = 0 TO n - 1 DO
    FOR EACH element IN buckets[i] DO
      A[index] = element
      index = index + 1
    END FOR
  END FOR

  RETURN A
END

Real-World Applications

Radix Sort for integers: database engines use Radix Sort for sorting integer keys when the key size is bounded, outperforming comparison sorts on large datasets.
Counting Sort for small ranges: histogram computation, character frequency analysis, and sorting exam scores (0-100) all use Counting Sort directly.
Suffix arrays: efficient suffix array construction algorithms (like SA-IS) use Radix Sort internally to sort suffixes by their first few characters.
Network packet routing: routers sort packets by priority levels (a small fixed range), making Counting Sort a natural fit.
Bucket Sort for floating-point data: scientific computing and graphics pipelines sort uniformly distributed floating-point values using Bucket Sort for near-linear performance.

Key Takeaways

Non-comparison sorts break the $\Omega(n \log n)$ barrier by exploiting data structure: value ranges, digit counts, or distributions
Counting Sort is $O(n + k)$ and works well when the value range $k$ is not much larger than $n$
Radix Sort extends Counting Sort to handle larger ranges by processing one digit at a time, achieving $O(d(n + k))$
Bucket Sort achieves $O(n)$ expected time for uniformly distributed data, but degrades to $O(n^2)$ in the worst case
Stability matters: Radix Sort depends on its subroutine sort being stable to produce correct results

2021-04-03

Binary Search and Its Variants

Mastering the classic binary search algorithm and its powerful variants: lower bound, upper bound, and searching rotated arrays.

2021-04-04

Dynamic Programming Fundamentals

Understanding overlapping subproblems, optimal substructure, memoization, and tabulation: the core ideas behind dynamic programming.

2021-04-05

Advanced DP Patterns: Knapsack, LCS, Matrix Chain, and Bitmask DP

Four essential dynamic programming patterns that appear everywhere: 0/1 Knapsack, Longest Common Subsequence, Matrix Chain Multiplication, and Bitmask DP.

Non-Comparison Sorting: Counting Sort, Radix Sort, and Bucket Sort

Breaking the n log n barrier by exploiting structure in the data, with Counting Sort, Radix Sort, and Bucket Sort.

Terminology

What & Why

How It Works

Counting Sort

Radix Sort (LSD)

Bucket Sort

Complexity Analysis

Counting Sort Analysis

Radix Sort Analysis

Bucket Sort Expected Time

Implementation

Counting Sort

Radix Sort (LSD)

Bucket Sort

Real-World Applications

Key Takeaways

Read More