Notes/BFS and DFS: The Two Ways to Explore a Graph

BFS and DFS: The Two Ways to Explore a Graph

A deep dive into breadth-first search and depth-first search, their properties, trade-offs, and the problems each one solves best.

2021-01-19AI-Synthesized from Personal Notes

CS FundamentalsBfsDfsGraphsAlgorithms

Terminology

BFS (Breadth-First Search): A graph traversal that explores all vertices at distance $d$ from the source before any vertex at distance $d+1$. Uses a queue. Guarantees shortest path in unweighted graphs.
DFS (Depth-First Search): A graph traversal that explores as far as possible along each branch before backtracking. Uses a stack (or recursion). Useful for cycle detection, topological sort, and connected components.
Frontier: The set of vertices discovered but not yet fully explored. In BFS this is the queue; in DFS this is the stack (or the current recursion path).
Visited Set: A set tracking which vertices have been discovered, preventing re-exploration and infinite loops in cyclic graphs.
Shortest Path (Unweighted): The path with the fewest edges between two vertices. BFS finds this naturally because it explores level by level.
Topological Sort: A linear ordering of vertices in a directed acyclic graph (DAG) such that for every edge $(u, v)$, $u$ appears before $v$. DFS produces this by recording vertices in reverse finish order.
Connected Component: A maximal set of vertices where every pair is connected by a path. Both BFS and DFS can identify all connected components by running from each unvisited vertex.
Back Edge: An edge from a vertex to one of its ancestors in the DFS tree. The presence of a back edge indicates a cycle in the graph.
Tree Edge: An edge in the DFS or BFS tree that leads to a newly discovered vertex.
Level (BFS): The set of all vertices at the same distance from the source. BFS processes vertices level by level.

What & Why

BFS and DFS are the two fundamental strategies for systematically visiting every vertex in a graph. They differ in one key choice: which vertex to explore next from the frontier.

BFS picks the vertex that was discovered earliest (FIFO, using a queue), so it fans out level by level. DFS picks the vertex that was discovered most recently (LIFO, using a stack), so it dives deep before backtracking.

Why does this matter? This single difference in exploration order gives each algorithm distinct properties that make them suited to different problems:

BFS finds shortest paths in unweighted graphs, computes distances, and detects the minimum number of steps to reach a target
DFS detects cycles, computes topological orderings, finds connected/strongly connected components, and solves backtracking problems

Every graph algorithm you'll encounter in this curriculum builds on one of these two traversals. Dijkstra's algorithm is a weighted generalization of BFS. Tarjan's SCC algorithm is built on DFS. Understanding both deeply is non-negotiable.

Key properties:

Same time complexity: both run in $O(V + E)$ on adjacency lists
Different space profiles: BFS stores the entire frontier (can be wide); DFS stores the current path (can be deep)
BFS = shortest path in unweighted graphs; DFS does not guarantee shortest paths
DFS = cycle detection via back edges; BFS can detect cycles but less naturally

How It Works

BFS: Level-by-Level Exploration

BFS starts at a source vertex, visits all its neighbors (level 1), then all neighbors of those neighbors (level 2), and so on. It uses a queue to maintain the frontier.

At each step, BFS dequeues the front vertex, processes it, and enqueues all its unvisited neighbors. The queue ensures FIFO order, which is why vertices are processed level by level.

DFS: Dive Deep, Then Backtrack

DFS starts at a source vertex and immediately dives into the first neighbor, then that neighbor's first neighbor, and so on until it hits a dead end. Then it backtracks to the most recent vertex with unexplored neighbors and continues.

DFS can be implemented with an explicit stack or with recursion (which uses the call stack implicitly). The recursive version is more common because it's cleaner, but the iterative version avoids stack overflow on deep graphs.

BFS vs DFS: Side-by-Side Comparison

Property	BFS	DFS
Data structure	Queue (FIFO)	Stack (LIFO) / recursion
Exploration order	Level by level	Branch by branch
Shortest path?	Yes (unweighted)	No
Space (worst case)	$O(V)$ (wide frontier)	$O(V)$ (deep path)
Cycle detection	Possible but awkward	Natural (back edges)
Topological sort	Kahn's algorithm (BFS-based)	Reverse postorder

When to Use Which

Use BFS when you need shortest paths, minimum steps, or level-order processing
Use DFS when you need cycle detection, topological ordering, path existence, or backtracking
For connected components, either works; DFS is slightly simpler to implement recursively

DFS Edge Classification

During DFS on a directed graph, every edge falls into one of four categories:

Tree edge: leads to an unvisited vertex (part of the DFS tree)
Back edge: leads to an ancestor in the DFS tree (indicates a cycle)
Forward edge: leads to a descendant already visited via another path
Cross edge: leads to a vertex in a different branch of the DFS tree

In undirected graphs, only tree edges and back edges exist. A back edge in an undirected graph means a cycle.

Complexity Analysis

Algorithm	Time	Space	Notes
BFS	$O(V + E)$	$O(V)$	Queue can hold up to $O(V)$ vertices at widest level
DFS (recursive)	$O(V + E)$	$O(V)$	Call stack depth up to $O(V)$ for linear graphs
DFS (iterative)	$O(V + E)$	$O(V)$	Explicit stack avoids call stack overflow
BFS shortest path	$O(V + E)$	$O(V)$	Parent map adds $O(V)$ space
Find all components	$O(V + E)$	$O(V)$	Run BFS/DFS from each unvisited vertex

Space Trade-off in Practice

BFS space depends on the maximum width of the graph (the largest level). In a balanced binary tree with $n$ nodes, the widest level has $\approx n/2$ nodes, so BFS uses $O(n)$ space. DFS on the same tree uses $O(\log n)$ space (the height).

Conversely, on a long chain graph (path graph), DFS uses $O(n)$ stack space while BFS uses $O(1)$ queue space (each level has exactly one node).

$\text{BFS space} \propto \text{max level width}$

$\text{DFS space} \propto \text{max depth}$

Implementation

BFS (Pseudocode)

FUNCTION bfs(graph, start):
    visited ← set containing {start}
    queue ← empty queue
    ENQUEUE start INTO queue

    WHILE queue IS NOT EMPTY:
        node ← DEQUEUE from queue
        PROCESS node
        FOR EACH neighbor IN graph[node]:
            IF neighbor NOT IN visited THEN
                ADD neighbor TO visited
                ENQUEUE neighbor INTO queue

BFS Shortest Path (Pseudocode)

FUNCTION bfsShortestPath(graph, start, target):
    visited ← set containing {start}
    queue ← empty queue
    parent ← empty map
    ENQUEUE start INTO queue

    WHILE queue IS NOT EMPTY:
        node ← DEQUEUE from queue
        IF node = target THEN
            RETURN reconstructPath(parent, start, target)
        FOR EACH neighbor IN graph[node]:
            IF neighbor NOT IN visited THEN
                ADD neighbor TO visited
                parent[neighbor] ← node
                ENQUEUE neighbor INTO queue

    RETURN NULL   // no path exists

FUNCTION reconstructPath(parent, start, target):
    path ← empty list
    current ← target
    WHILE current ≠ start:
        PREPEND current TO path
        current ← parent[current]
    PREPEND start TO path
    RETURN path

DFS Recursive (Pseudocode)

FUNCTION dfs(graph, start):
    visited ← empty set
    dfsExplore(graph, start, visited)

FUNCTION dfsExplore(graph, node, visited):
    ADD node TO visited
    PROCESS node
    FOR EACH neighbor IN graph[node]:
        IF neighbor NOT IN visited THEN
            dfsExplore(graph, neighbor, visited)

DFS Iterative (Pseudocode)

FUNCTION dfsIterative(graph, start):
    visited ← empty set
    stack ← empty stack
    PUSH start ONTO stack

    WHILE stack IS NOT EMPTY:
        node ← POP from stack
        IF node IN visited THEN CONTINUE
        ADD node TO visited
        PROCESS node
        FOR EACH neighbor IN graph[node]:
            IF neighbor NOT IN visited THEN
                PUSH neighbor ONTO stack

Cycle Detection with DFS (Pseudocode)

FUNCTION hasCycle(graph):
    visited ← empty set
    inStack ← empty set    // tracks current DFS path

    FOR EACH vertex IN graph:
        IF vertex NOT IN visited THEN
            IF dfsCycle(graph, vertex, visited, inStack) THEN
                RETURN true
    RETURN false

FUNCTION dfsCycle(graph, node, visited, inStack):
    ADD node TO visited
    ADD node TO inStack
    FOR EACH neighbor IN graph[node]:
        IF neighbor NOT IN visited THEN
            IF dfsCycle(graph, neighbor, visited, inStack) THEN
                RETURN true
        ELSE IF neighbor IN inStack THEN
            RETURN true    // back edge found = cycle
    REMOVE node FROM inStack
    RETURN false

Connected Components (Pseudocode)

FUNCTION findComponents(graph):
    visited ← empty set
    components ← empty list

    FOR EACH vertex IN graph:
        IF vertex NOT IN visited THEN
            component ← empty list
            bfsCollect(graph, vertex, visited, component)
            APPEND component TO components

    RETURN components

FUNCTION bfsCollect(graph, start, visited, component):
    ADD start TO visited
    queue ← empty queue
    ENQUEUE start INTO queue
    WHILE queue IS NOT EMPTY:
        node ← DEQUEUE from queue
        APPEND node TO component
        FOR EACH neighbor IN graph[node]:
            IF neighbor NOT IN visited THEN
                ADD neighbor TO visited
                ENQUEUE neighbor INTO queue

Real-World Applications

Shortest path in unweighted graphs: BFS finds the minimum number of hops between two nodes, used in social network "degrees of separation," word ladder puzzles, and network hop counting
Web crawling: search engines use BFS to discover pages level by level from seed URLs, ensuring broad coverage before deep crawling
Cycle detection in dependencies: package managers (npm, pip) use DFS to detect circular dependencies before installation
Topological sorting: build systems (Make, Gradle) use DFS-based topological sort to determine compilation order
Maze solving: BFS finds the shortest path through a maze; DFS finds any path (useful for maze generation)
Garbage collection: mark-and-sweep GC uses DFS/BFS from root references to identify reachable objects
Network broadcast: BFS models how information spreads through a network, level by level from the source
Strongly connected components: Tarjan's and Kosaraju's algorithms use DFS to decompose directed graphs into SCCs

Key Takeaways

BFS and DFS both visit every vertex and edge in $O(V + E)$ time, but their exploration order differs fundamentally
BFS uses a queue and explores level by level, making it the right choice for shortest paths in unweighted graphs
DFS uses a stack (or recursion) and dives deep before backtracking, making it natural for cycle detection, topological sort, and backtracking problems
BFS space is proportional to the widest level; DFS space is proportional to the maximum depth
DFS classifies edges into tree, back, forward, and cross edges; back edges indicate cycles

2021-01-01

Arrays: The Foundation of Everything

Understanding contiguous memory, indexing, and dynamic arrays: the building block behind nearly every data structure.

2021-01-03

Linked Lists: Pointers All the Way Down

Singly, doubly, and circular linked lists: how pointer-based data structures trade random access for efficient insertion and deletion.

2021-01-05

Stacks: Last In, First Out

Understanding stacks, the LIFO data structure behind call stacks, expression evaluation, undo systems, and backtracking algorithms.