Notes/Union-Find (Disjoint Set Union)

Union-Find (Disjoint Set Union)

Learn the Union-Find data structure with path compression and union by rank for connected components, dynamic connectivity, and Kruskal's MST algorithm.

2025-09-22AI-Synthesized from Personal Notes

Coding PatternsUnion FindDisjoint SetLeetcode

Terminology

Term	Definition
Union-Find (DSU)	A data structure that tracks a collection of disjoint sets, supporting two operations: find (which set does an element belong to?) and union (merge two sets).
Find	Returns the representative (root) of the set containing a given element. Two elements are in the same set if and only if they have the same root.
Union	Merges the sets containing two elements into a single set by linking one root to the other.
Path compression	During find, make every node on the path point directly to the root. This flattens the tree and speeds up future queries.
Union by rank	When merging two sets, attach the shorter tree under the taller tree's root. Keeps the tree height logarithmic.
Inverse Ackermann $\alpha(n)$	The amortized time per operation with both optimizations. Grows so slowly that $\alpha(n) \leq 4$ for any practical $n$, effectively constant.
Connected component	A maximal group of nodes where every pair is connected by a path. Union-Find tracks these dynamically as edges are added.
Kruskal's algorithm	A greedy MST algorithm that sorts edges by weight and adds each edge if it connects two different components, using Union-Find for cycle detection.

What & Why

Union-Find answers one question extremely efficiently: "are these two elements in the same group?" It supports two operations: find (look up the group representative) and union (merge two groups). With path compression and union by rank, both operations run in amortized $O(\alpha(n))$ time, which is effectively constant.

This makes Union-Find the go-to structure for dynamic connectivity problems: as edges are added to a graph, you need to quickly determine whether two nodes are already connected. The classic application is Kruskal's minimum spanning tree algorithm, which processes edges in sorted order and uses Union-Find to skip edges that would create a cycle.

In LeetCode, Union-Find appears in problems like "number of connected components," "redundant connection," "accounts merge," and "number of islands" (as an alternative to DFS).

How It Works

The Parent Array

Each element points to a parent. The root of each tree is the set representative (it points to itself). Find follows parent pointers to the root. Union links one root to another.

After path compression, nodes 3 and 4 point directly to root 0, making future find operations $O(1)$.

Complexity Analysis

Operation	Naive	Union by Rank	Both Optimizations
Find	$O(n)$	$O(\log n)$	$O(\alpha(n))$
Union	$O(n)$	$O(\log n)$	$O(\alpha(n))$
$m$ operations on $n$ elements	$O(mn)$	$O(m \log n)$	$O(m \cdot \alpha(n))$

The inverse Ackermann function $\alpha(n)$ grows incredibly slowly:

$\alpha(2^{65536}) = 4$

For all practical purposes, $\alpha(n) \leq 4$, making Union-Find operations effectively $O(1)$.

Implementation

Union-Find with Path Compression and Union by Rank

ALGORITHM InitUnionFind(n)
  parent = array [0, 1, 2, ..., n-1]   // each element is its own root
  rank = array of zeros, size n
  componentCount = n
END

ALGORITHM Find(x)
  IF parent[x] != x THEN
    parent[x] = Find(parent[x])    // path compression
  END IF
  RETURN parent[x]
END

ALGORITHM Union(a, b)
  rootA = Find(a)
  rootB = Find(b)

  IF rootA = rootB THEN RETURN FALSE   // already connected

  // Union by rank
  IF rank[rootA] < rank[rootB] THEN
    parent[rootA] = rootB
  ELSE IF rank[rootA] > rank[rootB] THEN
    parent[rootB] = rootA
  ELSE
    parent[rootB] = rootA
    rank[rootA] = rank[rootA] + 1
  END IF

  componentCount = componentCount - 1
  RETURN TRUE
END

ALGORITHM Connected(a, b)
  RETURN Find(a) = Find(b)
END

Kruskal's MST Using Union-Find

ALGORITHM KruskalMST(edges, n)
  INPUT: list of edges (u, v, weight), n vertices
  OUTPUT: list of MST edges, total weight

  SORT edges by weight ascending
  InitUnionFind(n)
  mstEdges = empty list
  totalWeight = 0

  FOR EACH (u, v, w) IN edges DO
    IF Union(u, v) THEN
      mstEdges.append((u, v, w))
      totalWeight = totalWeight + w
      IF LENGTH(mstEdges) = n - 1 THEN BREAK
    END IF
  END FOR

  RETURN mstEdges, totalWeight
END

Real-World Applications

Network connectivity: determining whether two computers can communicate in a network as links are added or removed.
Image segmentation: grouping pixels into regions based on similarity, merging adjacent similar pixels using union operations.
Kruskal's MST: building minimum spanning trees for network design, circuit wiring, and clustering by processing edges in weight order.
Social networks: "friend of a friend" queries and account merging (e.g., merging accounts that share an email address).
Percolation theory: simulating whether water flows from top to bottom of a porous material by tracking connected open sites.

Key Takeaways

Union-Find tracks disjoint sets with near-constant time find and union operations when using both path compression and union by rank
Path compression flattens the tree during find, making future queries faster. Union by rank keeps trees balanced during merges
The inverse Ackermann function $\alpha(n)$ is at most 4 for any practical input size, making the amortized cost effectively $O(1)$
Union-Find is the backbone of Kruskal's MST algorithm, providing $O(1)$ cycle detection as edges are added
When a problem involves dynamically grouping elements or checking connectivity as edges arrive, Union-Find is usually the right tool

2025-09-23

Interval Merging and Sweep Line

Master interval problems including merging overlapping intervals, meeting rooms scheduling, insert interval, and the event-based sweep line technique.

2025-09-24

Bit Manipulation Patterns

Master essential bit manipulation techniques including XOR tricks, single number detection, counting bits, power of two checks, and bitmask subset enumeration.

2025-09-25

Top-K and Heap Patterns

Solve Top-K problems efficiently using heaps: Kth largest element, merge K sorted lists, median from data stream, and min-heap vs max-heap selection strategies.