Notes/Graphs: Representation, BFS, and DFS

Graphs: Representation, BFS, and DFS

How to represent graphs with adjacency lists and matrices, and how breadth-first and depth-first search explore them systematically.

2021-01-15AI-Synthesized from Personal Notes

CS FundamentalsGraphsData Structures

Terminology

Graph: A data structure consisting of a set of vertices (nodes) $V$ and a set of edges $E$ connecting pairs of vertices. Written as $G = (V, E)$.
Vertex (Node): A fundamental unit in a graph, representing an entity. Vertices are connected by edges.
Edge: A connection between two vertices. In a directed graph, an edge has a direction (from $u$ to $v$). In an undirected graph, the edge connects both ways.
Directed Graph (Digraph): A graph where edges have a direction: edge $(u, v)$ goes from $u$ to $v$ but not necessarily from $v$ to $u$.
Undirected Graph: A graph where edges have no direction: edge $\{u, v\}$ connects $u$ and $v$ symmetrically.
Adjacency List: A graph representation where each vertex stores a list of its neighbors. Space-efficient for sparse graphs: $O(V + E)$.
Adjacency Matrix: A graph representation using a $V \times V$ matrix where entry $(i, j)$ indicates whether an edge exists between vertices $i$ and $j$. Uses $O(V^2)$ space.
BFS (Breadth-First Search): A traversal algorithm that explores all neighbors at the current depth before moving to the next depth level. Uses a queue.
DFS (Depth-First Search): A traversal algorithm that explores as far as possible along each branch before backtracking. Uses a stack (or recursion).
Connected Component: A maximal set of vertices such that there is a path between every pair. An undirected graph may have multiple connected components.

What & Why

A graph is the most general-purpose data structure for representing relationships between entities. Unlike trees (which are hierarchical and acyclic), graphs can have cycles, multiple paths between nodes, and no designated root. This flexibility makes them essential for modeling networks, dependencies, maps, social connections, and countless other domains.

Why does this matter? Graphs appear everywhere in computing. Social networks model friendships as edges. GPS navigation finds shortest paths through road graphs. Package managers resolve dependency graphs. Web crawlers traverse the link graph of the internet. Understanding graph representation and traversal is foundational to solving these problems.

Key properties:

Flexible structure: can represent any pairwise relationship, with or without direction and weights
Two standard representations: adjacency lists (space-efficient for sparse graphs) and adjacency matrices (fast edge lookup)
BFS and DFS: the two fundamental traversal strategies, each with distinct properties and use cases
Foundation for advanced algorithms: shortest path, minimum spanning tree, topological sort, and network flow all build on graph basics

How It Works

Graph Representations

Consider this undirected graph with 5 vertices and 6 edges:

Adjacency List representation:

0: [1, 2]
1: [0, 2, 3]
2: [0, 1, 4]
3: [1, 4]
4: [2, 3]

Each vertex maps to a list of its neighbors. Total storage: $O(V + E)$. Checking if edge $(u, v)$ exists requires scanning $u$'s neighbor list: $O(\text{degree}(u))$.

Adjacency Matrix representation:

    0  1  2  3  4
0 [ 0, 1, 1, 0, 0 ]
1 [ 1, 0, 1, 1, 0 ]
2 [ 1, 1, 0, 0, 1 ]
3 [ 0, 1, 0, 0, 1 ]
4 [ 0, 0, 1, 1, 0 ]

A 1 at position $(i, j)$ means an edge exists. Edge lookup is $O(1)$, but storage is $O(V^2)$ regardless of edge count.

When to Use Which

Criteria	Adjacency List	Adjacency Matrix
Space	$O(V + E)$	$O(V^2)$
Edge lookup	$O(\text{deg}(u))$	$O(1)$
Iterate neighbors	$O(\text{deg}(u))$	$O(V)$
Best for	Sparse graphs ($E \ll V^2$)	Dense graphs ($E \approx V^2$)

BFS (Breadth-First Search)

BFS explores the graph level by level, visiting all vertices at distance $d$ from the source before any vertex at distance $d + 1$. It uses a queue to maintain the frontier.

Starting BFS from vertex 0 on the graph above visits: 0, 1, 2, 3, 4.

BFS naturally finds the shortest path (fewest edges) from the source to every reachable vertex in an unweighted graph.

DFS (Depth-First Search)

DFS explores as deep as possible along each branch before backtracking. It uses a stack (explicit or via recursion) to track the current path.

Starting DFS from vertex 0 might visit: 0, 1, 2, 4, 3 (the exact order depends on neighbor iteration order).

DFS is useful for detecting cycles, topological sorting, finding connected components, and solving maze-like problems.

Complexity Analysis

Algorithm	Time	Space	Notes
BFS	$O(V + E)$	$O(V)$	Queue + visited set
DFS	$O(V + E)$	$O(V)$	Stack + visited set
BFS shortest path	$O(V + E)$	$O(V)$	Unweighted graphs only
Connected components	$O(V + E)$	$O(V)$	Run BFS/DFS from each unvisited vertex

Both BFS and DFS visit every vertex and examine every edge exactly once (in an adjacency list representation), giving $O(V + E)$ time. The space is $O(V)$ for the visited set and the queue/stack.

Implementation

Graph Construction (Pseudocode)

FUNCTION buildGraph(edges, directed):
    graph ← empty adjacency list (map of vertex → list of neighbors)
    FOR EACH (u, v) IN edges:
        ADD v TO graph[u]
        IF NOT directed THEN ADD u TO graph[v]
    RETURN graph

BFS (Pseudocode)

FUNCTION bfs(graph, start):
    visited ← set containing {start}
    queue ← empty queue
    order ← empty list

    ENQUEUE start INTO queue

    WHILE queue IS NOT EMPTY:
        node ← DEQUEUE from queue
        APPEND node TO order
        FOR EACH neighbor IN graph[node]:
            IF neighbor NOT IN visited THEN
                ADD neighbor TO visited
                ENQUEUE neighbor INTO queue

    RETURN order

DFS (Pseudocode)

FUNCTION dfs(graph, start):
    visited ← empty set
    order ← empty list
    explore(graph, start, visited, order)
    RETURN order

FUNCTION explore(graph, node, visited, order):
    ADD node TO visited
    APPEND node TO order
    FOR EACH neighbor IN graph[node]:
        IF neighbor NOT IN visited THEN
            explore(graph, neighbor, visited, order)

BFS Shortest Path (Pseudocode)

FUNCTION bfsShortestPath(graph, start, end):
    visited ← set containing {start}
    queue ← empty queue
    parent ← empty map

    ENQUEUE start INTO queue

    WHILE queue IS NOT EMPTY:
        node ← DEQUEUE from queue
        IF node = end THEN
            path ← empty list
            current ← end
            WHILE current IS DEFINED:
                PREPEND current TO path
                current ← parent[current]
            RETURN path
        FOR EACH neighbor IN graph[node]:
            IF neighbor NOT IN visited THEN
                ADD neighbor TO visited
                parent[neighbor] ← node
                ENQUEUE neighbor INTO queue

    RETURN NULL   // no path found

Real-World Applications

Social networks: users are vertices, friendships are edges; BFS finds degrees of separation between people
GPS navigation: road networks are weighted graphs; shortest-path algorithms (Dijkstra, A*) find optimal routes
Web crawling: the internet is a directed graph of pages linked by hyperlinks; crawlers use BFS to discover pages level by level
Package managers: npm, pip, and cargo model dependencies as directed acyclic graphs; topological sort determines installation order
Network routing: routers use graph algorithms (OSPF uses Dijkstra, BGP uses path-vector) to find optimal packet routes
Garbage collection: mark-and-sweep GC uses DFS/BFS from root references to find reachable objects in the object graph

Key Takeaways

Graphs are the most general relationship data structure: vertices connected by edges, with optional direction and weights
Adjacency lists use $O(V + E)$ space and are preferred for sparse graphs; adjacency matrices use $O(V^2)$ but offer $O(1)$ edge lookup
BFS explores level by level using a queue and finds shortest paths in unweighted graphs
DFS explores depth-first using a stack (or recursion) and is the basis for cycle detection, topological sort, and connected components
Both BFS and DFS run in $O(V + E)$ time, visiting every vertex and edge exactly once

2021-01-17

Recursion: Solving Problems by Solving Smaller Problems

How recursive functions work, why they need base cases, and how to think about call stacks, recurrence relations, and memoization.

2021-01-19

BFS and DFS: The Two Ways to Explore a Graph

A deep dive into breadth-first search and depth-first search, their properties, trade-offs, and the problems each one solves best.

2021-01-01

Arrays: The Foundation of Everything

Understanding contiguous memory, indexing, and dynamic arrays: the building block behind nearly every data structure.