Topological Sort, Strongly Connected Components, and Tarjan's Algorithm
Advanced graph algorithms for directed graphs: ordering dependencies with topological sort and finding strongly connected components with Tarjan's algorithm.
Terminology
- Directed Acyclic Graph (DAG): a directed graph with no cycles. Topological sort is only defined for DAGs.
- Topological sort: a linear ordering of vertices in a DAG such that for every directed edge $(u, v)$, vertex $u$ appears before $v$ in the ordering.
- In-degree: the number of incoming edges to a vertex. In Kahn's algorithm, vertices with in-degree 0 have no unprocessed dependencies.
- Strongly connected component (SCC): a maximal set of vertices in a directed graph where every vertex is reachable from every other vertex in the set.
- Component graph (condensation): the DAG formed by contracting each SCC into a single vertex. This graph is always acyclic.
- DFS discovery time: the timestamp when DFS first visits a vertex. Used by Tarjan's algorithm to identify SCCs.
- Low-link value: the smallest discovery time reachable from a vertex through the DFS subtree and back edges. In Tarjan's algorithm, a vertex is the root of an SCC if its low-link equals its discovery time.
- Back edge: an edge from a vertex to one of its ancestors in the DFS tree. The presence of a back edge indicates a cycle in a directed graph.
- Cross edge: an edge between vertices in different DFS subtrees. Tarjan's algorithm must distinguish these from back edges.
What & Why
Directed graphs model dependencies, workflows, and relationships where direction matters. Two fundamental questions arise:
- What order should I process things? If tasks have dependencies (task A must finish before task B), topological sort gives a valid execution order.
- Which parts of the graph are mutually reachable? Strongly connected components identify groups of vertices where every vertex can reach every other, revealing tightly coupled subsystems.
Topological sort is essential for:
- Build systems (compile dependencies in the right order)
- Course scheduling (prerequisites before advanced courses)
- Package managers (install dependencies before dependents)
SCC decomposition is essential for:
- Compiler optimization (identifying loops in control flow graphs)
- Social network analysis (finding tightly connected communities)
- 2-SAT solving (a classic reduction from boolean satisfiability to SCC)
How It Works
Topological Sort (Kahn's Algorithm, BFS-based)
Kahn's algorithm uses in-degrees:
- Compute the in-degree of every vertex
- Add all vertices with in-degree 0 to a queue
- While the queue is not empty: remove a vertex, add it to the result, and decrement the in-degree of all its neighbors. If any neighbor's in-degree becomes 0, add it to the queue.
- If the result contains all vertices, the graph is a DAG. Otherwise, a cycle exists.
Topological Sort (DFS-based)
The DFS approach runs a depth-first search and adds each vertex to the front of the result list when its DFS call finishes (post-order). The reverse post-order is a valid topological sort.
Tarjan's Algorithm for SCCs
Tarjan's algorithm finds all SCCs in a single DFS pass:
- Maintain a stack of vertices in the current DFS path
- For each vertex, track its discovery time and low-link value
- When DFS finishes a vertex: if its low-link equals its discovery time, it is the root of an SCC. Pop all vertices from the stack down to (and including) this vertex; they form one SCC.
The low-link value for a vertex v is the smallest discovery time reachable from v through the DFS subtree and back edges to vertices still on the stack.
Complexity Analysis
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| Topological Sort (Kahn's) | $O(V + E)$ | $O(V + E)$ | BFS-based, detects cycles |
| Topological Sort (DFS) | $O(V + E)$ | $O(V + E)$ | DFS-based, simpler code |
| Tarjan's SCC | $O(V + E)$ | $O(V)$ | Single DFS pass |
| Kosaraju's SCC | $O(V + E)$ | $O(V + E)$ | Two DFS passes, needs transpose graph |
Why O(V + E)?
All four algorithms are based on DFS or BFS, which visits every vertex once and traverses every edge once:
This is optimal: you must examine every vertex and edge at least once to determine the structure of the graph.
SCC Count Bounds
A directed graph with $V$ vertices has between 1 and $V$ strongly connected components. If the graph is strongly connected (one big SCC), there is 1 component. If the graph is a DAG (no cycles at all), there are $V$ components (each vertex is its own SCC).
Implementation
Topological Sort (Kahn's Algorithm)
ALGORITHM KahnTopologicalSort(graph, V)
INPUT: directed graph as adjacency list, V vertices
OUTPUT: topological ordering, or report cycle
CREATE inDegree array of size V, initialized to 0
FOR EACH vertex u DO
FOR EACH neighbor v of u DO
inDegree[v] = inDegree[v] + 1
END FOR
END FOR
CREATE queue Q
FOR EACH vertex u DO
IF inDegree[u] = 0 THEN
ENQUEUE u into Q
END IF
END FOR
result = empty list
count = 0
WHILE Q is not empty DO
u = DEQUEUE from Q
APPEND u to result
count = count + 1
FOR EACH neighbor v of u DO
inDegree[v] = inDegree[v] - 1
IF inDegree[v] = 0 THEN
ENQUEUE v into Q
END IF
END FOR
END WHILE
IF count != V THEN
REPORT "Graph contains a cycle"
END IF
RETURN result
END
Topological Sort (DFS-based)
ALGORITHM DFSTopologicalSort(graph, V)
INPUT: directed graph as adjacency list, V vertices
OUTPUT: topological ordering
CREATE visited array of size V, set all to FALSE
CREATE stack S (for result)
FOR EACH vertex u DO
IF NOT visited[u] THEN
DFSVisit(graph, u, visited, S)
END IF
END FOR
RETURN S (reversed, or read from top to bottom)
END
ALGORITHM DFSVisit(graph, u, visited, S)
visited[u] = TRUE
FOR EACH neighbor v of u DO
IF NOT visited[v] THEN
DFSVisit(graph, v, visited, S)
END IF
END FOR
PUSH u onto S // add to result after all descendants are processed
END
Tarjan's SCC Algorithm
ALGORITHM TarjanSCC(graph, V)
INPUT: directed graph as adjacency list, V vertices
OUTPUT: list of strongly connected components
CREATE disc array of size V, set all to -1
CREATE low array of size V, set all to -1
CREATE onStack array of size V, set all to FALSE
CREATE stack S
timer = 0
sccs = empty list
FOR EACH vertex u DO
IF disc[u] = -1 THEN
TarjanDFS(graph, u, disc, low, onStack, S, timer, sccs)
END IF
END FOR
RETURN sccs
END
ALGORITHM TarjanDFS(graph, u, disc, low, onStack, S, timer, sccs)
disc[u] = timer
low[u] = timer
timer = timer + 1
PUSH u onto S
onStack[u] = TRUE
FOR EACH neighbor v of u DO
IF disc[v] = -1 THEN
// v not yet visited
TarjanDFS(graph, v, disc, low, onStack, S, timer, sccs)
low[u] = MIN(low[u], low[v])
ELSE IF onStack[v] THEN
// v is on the stack, so it is in the current SCC
low[u] = MIN(low[u], disc[v])
END IF
END FOR
// If u is the root of an SCC
IF low[u] = disc[u] THEN
component = empty list
REPEAT
w = POP from S
onStack[w] = FALSE
APPEND w to component
UNTIL w = u
APPEND component to sccs
END IF
END
Real-World Applications
- Build systems: Make, Gradle, and Bazel use topological sort to determine the correct compilation order based on file dependencies.
- Package managers: npm, pip, and apt resolve dependency graphs using topological sort to install packages in the right order.
- Course scheduling: universities use topological sort to ensure prerequisite courses are taken before advanced ones.
- Compiler optimization: SCC detection in control flow graphs identifies natural loops, enabling loop optimizations like unrolling and vectorization.
- 2-SAT solving: the 2-SAT problem (a restricted form of boolean satisfiability) is solvable in linear time by reducing it to SCC detection on an implication graph.
- Web crawling: search engines use SCC analysis to identify clusters of tightly interlinked pages, which can indicate link farms or topical communities.
Key Takeaways
- Topological sort produces a valid dependency ordering for DAGs in $O(V + E)$ time. If the graph has a cycle, no topological sort exists.
- Kahn's algorithm (BFS with in-degrees) naturally detects cycles; the DFS approach is more concise but requires separate cycle detection
- Strongly connected components partition a directed graph into maximal groups of mutually reachable vertices
- Tarjan's algorithm finds all SCCs in a single DFS pass using discovery times and low-link values
- The condensation (component graph) of any directed graph is always a DAG, enabling topological sort on the SCC level