Trees: Hierarchical Data and Binary Search Trees
Understanding tree structures, binary trees, BSTs, and the three classic traversal orders that unlock recursive thinking.
Terminology
- Tree
- A connected, acyclic graph with a designated root node. Every node except the root has exactly one parent.
- Root
- The topmost node in a tree, with no parent. All other nodes are descendants of the root.
- Leaf
- A node with no children. Leaves are the terminal nodes of the tree.
- Height
- The number of edges on the longest path from a node down to a leaf. The height of a tree is the height of its root.
- Depth
- The number of edges from the root down to a given node. The root has depth 0.
- Binary Tree
- A tree where each node has at most two children, referred to as the left child and right child.
- Binary Search Tree (BST)
- A binary tree where for every node $x$: all keys in the left subtree are less than $x$'s key, and all keys in the right subtree are greater.
- Traversal
- A systematic way of visiting every node in a tree exactly once. The three classic orderings are inorder, preorder, and postorder.
- Balanced Tree
- A tree where the heights of left and right subtrees differ by at most a constant factor, ensuring $O(\log n)$ operations.
What & Why
A tree is a hierarchical data structure where each element (node) has a parent-child relationship with other elements. Unlike arrays and linked lists, which are linear, trees branch out, making them natural for representing hierarchies: file systems, HTML DOMs, organization charts, and decision processes.
Why does this matter? Trees are the foundation for fast searching, sorting, and hierarchical organization. Binary search trees give $O(\log n)$ lookup when balanced. Traversal algorithms on trees are the gateway to recursive thinking, which is central to CS problem-solving.
Key properties:
- Hierarchical structure: models parent-child relationships naturally
- $O(\log n)$ search in balanced BSTs, compared to $O(n)$ for linked lists
- Recursive definition: a tree is either empty or a node with subtrees, making recursive algorithms elegant
- Foundation for advanced structures: heaps, tries, B-trees, and red-black trees all build on the basic tree concept
How It Works
Binary Tree Structure
Each node stores a value and pointers to its left and right children. A null pointer indicates the absence of a child.
This is a valid BST: for every node, left descendants are smaller and right descendants are larger. For example, node 3 has children 1 (left, smaller) and 6 (right, larger).
BST Search
To find a key $k$ in a BST, start at the root and at each node:
- If $k$ equals the node's key, return the node
- If $k$ is less, recurse into the left subtree
- If $k$ is greater, recurse into the right subtree
- If the subtree is empty, the key is not in the tree
Each comparison eliminates one subtree, halving the search space in a balanced tree. This gives $O(\log n)$ time, similar to binary search on a sorted array.
Traversal Orders
The three classic depth-first traversals differ only in when they visit the current node relative to its children:
- Inorder (Left, Node, Right): visits BST nodes in sorted order. For the tree above: 1, 3, 6, 8, 10, 14
- Preorder (Node, Left, Right): visits the root before its subtrees. Useful for copying or serializing a tree: 8, 3, 1, 6, 10, 14
- Postorder (Left, Right, Node): visits children before the parent. Useful for deletion or evaluating expression trees: 1, 6, 3, 14, 10, 8
BST Insertion and Deletion
Insertion follows the same path as search: walk down the tree comparing keys until you find an empty spot, then place the new node there.
Deletion has three cases:
- Leaf node: simply remove it
- One child: replace the node with its child
- Two children: replace the node's key with its inorder successor (smallest node in the right subtree), then delete that successor
Complexity Analysis
| Operation | Balanced BST | Worst Case (Skewed) | Notes |
|---|---|---|---|
| Search | $O(\log n)$ | $O(n)$ | Degenerates to linked list if unbalanced |
| Insert | $O(\log n)$ | $O(n)$ | Same path as search |
| Delete | $O(\log n)$ | $O(n)$ | Finding successor adds at most $O(h)$ |
| Inorder traversal | $O(n)$ | $O(n)$ | Visits every node exactly once |
| Space | $O(n)$ | $O(n)$ | One node per element |
Height of a Balanced Binary Tree
A balanced binary tree with $n$ nodes has height:
This is why balanced BST operations are $O(\log n)$: the longest path from root to leaf has $\lfloor \log_2 n \rfloor$ edges.
A completely skewed tree (every node has only one child) has height $n - 1$, degrading all operations to $O(n)$. Self-balancing trees like AVL trees and red-black trees prevent this by maintaining balance invariants after every insertion and deletion.
Implementation
BST Node and Core Operations (Pseudocode)
STRUCTURE TreeNode:
value
left ← NULL
right ← NULL
FUNCTION insert(root, val):
IF root = NULL THEN RETURN new TreeNode(val)
IF val < root.value THEN
root.left ← insert(root.left, val)
ELSE IF val > root.value THEN
root.right ← insert(root.right, val)
RETURN root
FUNCTION search(root, val):
IF root = NULL THEN RETURN false
IF val = root.value THEN RETURN true
IF val < root.value THEN RETURN search(root.left, val)
RETURN search(root.right, val)
Traversals (Pseudocode)
FUNCTION inorder(root, result):
IF root = NULL THEN RETURN
inorder(root.left, result)
APPEND root.value TO result
inorder(root.right, result)
FUNCTION preorder(root, result):
IF root = NULL THEN RETURN
APPEND root.value TO result
preorder(root.left, result)
preorder(root.right, result)
FUNCTION postorder(root, result):
IF root = NULL THEN RETURN
postorder(root.left, result)
postorder(root.right, result)
APPEND root.value TO result
BST Deletion (Pseudocode)
FUNCTION delete(root, val):
IF root = NULL THEN RETURN NULL
IF val < root.value THEN
root.left ← delete(root.left, val)
ELSE IF val > root.value THEN
root.right ← delete(root.right, val)
ELSE
// Case 1: leaf node
IF root.left = NULL AND root.right = NULL THEN RETURN NULL
// Case 2: one child
IF root.left = NULL THEN RETURN root.right
IF root.right = NULL THEN RETURN root.left
// Case 3: two children
successor ← findMin(root.right)
root.value ← successor.value
root.right ← delete(root.right, successor.value)
RETURN root
FUNCTION findMin(node):
WHILE node.left ≠ NULL:
node ← node.left
RETURN node
Real-World Applications
- File systems: directories and files form a tree; every path from root to file is unique
- HTML/DOM: the browser parses HTML into a tree of elements, enabling CSS selectors and JavaScript DOM traversal
- Database indexes: B-trees and B+ trees (balanced multi-way trees) are the standard for on-disk database indexes
- Compilers: abstract syntax trees (ASTs) represent parsed source code as a tree of expressions and statements
- Decision trees: machine learning models that split data at each node based on feature thresholds
- Git: commits form a directed acyclic graph, but the repository's file structure at each commit is a tree (tree objects)
Key Takeaways
- Trees model hierarchical relationships and enable $O(\log n)$ search when balanced
- A BST maintains the invariant: left subtree keys < node key < right subtree keys
- Inorder traversal of a BST yields sorted output, making BSTs a natural choice for ordered data
- Unbalanced BSTs degrade to $O(n)$ operations; self-balancing variants (AVL, red-black) solve this
- Traversal order (inorder, preorder, postorder) determines the visit sequence and suits different use cases: sorting, copying, and cleanup respectively
Read More
2021-01-13
Heaps and Priority Queues: Always Know the Extreme
How binary heaps maintain the min or max element at the top in O(log n) time, powering priority queues, heap sort, and scheduling systems.
2021-01-15
Graphs: Representation, BFS, and DFS
How to represent graphs with adjacency lists and matrices, and how breadth-first and depth-first search explore them systematically.
2021-01-17
Recursion: Solving Problems by Solving Smaller Problems
How recursive functions work, why they need base cases, and how to think about call stacks, recurrence relations, and memoization.