Non-Comparison Sorting: Counting Sort, Radix Sort, and Bucket Sort
Breaking the n log n barrier by exploiting structure in the data, with Counting Sort, Radix Sort, and Bucket Sort.
Terminology
- Non-comparison sort: a sorting algorithm that does not rely on pairwise element comparisons. Instead, it uses properties of the data (like digit values or ranges) to place elements directly.
- Stable sort: a sort that preserves the relative order of elements with equal keys. Stability is critical when Radix Sort uses Counting Sort as a subroutine.
- Key range ($k$): the difference between the maximum and minimum possible values in the input. Counting Sort's performance depends on $k$.
- Radix: the base of a number system (e.g., base 10, base 256). Radix Sort processes one digit at a time in a chosen radix.
- LSD (Least Significant Digit): a Radix Sort variant that processes digits from right to left, relying on a stable subroutine sort.
- MSD (Most Significant Digit): a Radix Sort variant that processes digits from left to right, recursively sorting each bucket.
- Bucket: a container (often a list) that collects elements falling within a certain range or sharing a certain digit value.
- Uniform distribution: a probability distribution where every value in a range is equally likely. Bucket Sort achieves $O(n)$ expected time under this assumption.
What & Why
Comparison-based sorts like Merge Sort and Quick Sort cannot beat \Omega(n \log n) in the worst case. But that lower bound only applies when the algorithm learns about element order exclusively through comparisons. If we know something about the structure of the data, such as the range of values or the number of digits, we can sort faster.
Non-comparison sorts exploit this extra information to achieve linear or near-linear time:
- Counting Sort: sorts integers in
O(n + k)time wherekis the value range - Radix Sort: sorts integers or strings in
O(d \cdot (n + k))time wheredis the number of digits - Bucket Sort: sorts uniformly distributed data in
O(n)expected time
The trade-off is that these algorithms have constraints on their input. Counting Sort needs a bounded integer range. Radix Sort needs fixed-length keys or bounded digit counts. Bucket Sort needs a known distribution. When those conditions hold, they outperform comparison-based sorts significantly.
How It Works
Counting Sort
Counting Sort works in three passes over the data:
- Count: scan the input and count how many times each value appears
- Accumulate: convert counts into starting positions using a prefix sum
- Place: walk through the input again, placing each element at its computed position
Radix Sort (LSD)
Radix Sort processes digits from least significant to most significant. For each digit position, it uses a stable sort (typically Counting Sort) to rearrange elements by that digit. Because the subroutine is stable, earlier digit orderings are preserved when later digits are equal.
For example, sorting three-digit numbers: first sort by the ones digit, then by the tens digit, then by the hundreds digit. After all passes, the array is fully sorted.
Bucket Sort
Bucket Sort distributes elements into a fixed number of buckets based on their value range, sorts each bucket individually (often with Insertion Sort), then concatenates the buckets. When the input is uniformly distributed, each bucket gets roughly O(1) elements, making the total work O(n).
Complexity Analysis
| Algorithm | Time | Space | Stable? |
|---|---|---|---|
| Counting Sort | $O(n + k)$ | $O(n + k)$ | Yes |
| Radix Sort | $O(d \cdot (n + k))$ | $O(n + k)$ | Yes |
| Bucket Sort | $O(n)$ expected | $O(n + b)$ | Depends |
Counting Sort Analysis
Counting Sort makes three linear passes: one over the input ($n$ elements), one over the count array ($k$ slots), and one more over the input for placement. Total:
When $k = O(n)$, this is linear. When $k \gg n$ (e.g., sorting 10 elements with values up to $10^9$), the count array dominates and Counting Sort becomes impractical.
Radix Sort Analysis
Radix Sort runs Counting Sort $d$ times (once per digit), where each digit has $k$ possible values:
For $w$-bit integers using radix $r = 2^b$, we have $d = w/b$ digits each with $k = 2^b$ values:
Choosing $b = \log_2 n$ gives $d = w / \log n$ and $k = n$, yielding $O(wn / \log n)$. For 32-bit integers, this beats $O(n \log n)$ comparison sorts when $n$ is large.
Bucket Sort Expected Time
With $n$ elements distributed uniformly into $n$ buckets, the expected number of elements per bucket is 1. Sorting each bucket with Insertion Sort costs $O(m^2)$ for $m$ elements, but the expected total is:
In the worst case (all elements in one bucket), Bucket Sort degrades to $O(n^2)$.
Implementation
Counting Sort
ALGORITHM CountingSort(A, n, k)
INPUT: array A of n non-negative integers, max value k
OUTPUT: sorted array B
CREATE count array C of size k + 1, initialized to 0
CREATE output array B of size n
// Step 1: Count occurrences
FOR i = 0 TO n - 1 DO
C[A[i]] = C[A[i]] + 1
END FOR
// Step 2: Prefix sum (cumulative count)
FOR i = 1 TO k DO
C[i] = C[i] + C[i - 1]
END FOR
// Step 3: Place elements (traverse backward for stability)
FOR i = n - 1 DOWNTO 0 DO
B[C[A[i]] - 1] = A[i]
C[A[i]] = C[A[i]] - 1
END FOR
RETURN B
END
Radix Sort (LSD)
ALGORITHM RadixSort(A, n, d)
INPUT: array A of n integers, d = number of digits
OUTPUT: A sorted in ascending order
FOR digit = 0 TO d - 1 DO
// Use stable Counting Sort on the current digit
StableCountingSortByDigit(A, n, digit)
END FOR
RETURN A
END
ALGORITHM StableCountingSortByDigit(A, n, digit)
INPUT: array A, size n, digit position to sort by
OUTPUT: A rearranged stably by the given digit
k = 9 // for base-10 digits
CREATE count array C of size k + 1, initialized to 0
CREATE output array B of size n
FOR i = 0 TO n - 1 DO
d = GET_DIGIT(A[i], digit)
C[d] = C[d] + 1
END FOR
FOR i = 1 TO k DO
C[i] = C[i] + C[i - 1]
END FOR
FOR i = n - 1 DOWNTO 0 DO
d = GET_DIGIT(A[i], digit)
B[C[d] - 1] = A[i]
C[d] = C[d] - 1
END FOR
COPY B into A
END
Bucket Sort
ALGORITHM BucketSort(A, n)
INPUT: array A of n floats in range [0, 1)
OUTPUT: A sorted in ascending order
CREATE array of n empty lists (buckets)
// Distribute elements into buckets
FOR i = 0 TO n - 1 DO
bucketIndex = FLOOR(A[i] * n)
APPEND A[i] to buckets[bucketIndex]
END FOR
// Sort each bucket (Insertion Sort for small lists)
FOR i = 0 TO n - 1 DO
InsertionSort(buckets[i])
END FOR
// Concatenate all buckets
index = 0
FOR i = 0 TO n - 1 DO
FOR EACH element IN buckets[i] DO
A[index] = element
index = index + 1
END FOR
END FOR
RETURN A
END
Real-World Applications
- Radix Sort for integers: database engines use Radix Sort for sorting integer keys when the key size is bounded, outperforming comparison sorts on large datasets.
- Counting Sort for small ranges: histogram computation, character frequency analysis, and sorting exam scores (0-100) all use Counting Sort directly.
- Suffix arrays: efficient suffix array construction algorithms (like SA-IS) use Radix Sort internally to sort suffixes by their first few characters.
- Network packet routing: routers sort packets by priority levels (a small fixed range), making Counting Sort a natural fit.
- Bucket Sort for floating-point data: scientific computing and graphics pipelines sort uniformly distributed floating-point values using Bucket Sort for near-linear performance.
Key Takeaways
- Non-comparison sorts break the $\Omega(n \log n)$ barrier by exploiting data structure: value ranges, digit counts, or distributions
- Counting Sort is $O(n + k)$ and works well when the value range $k$ is not much larger than $n$
- Radix Sort extends Counting Sort to handle larger ranges by processing one digit at a time, achieving $O(d(n + k))$
- Bucket Sort achieves $O(n)$ expected time for uniformly distributed data, but degrades to $O(n^2)$ in the worst case
- Stability matters: Radix Sort depends on its subroutine sort being stable to produce correct results