Trees: Hierarchical Data and Binary Search Trees
Understanding tree structures, binary trees, BSTs, and the three classic traversal orders that unlock recursive thinking.
Terminology
- Tree
- A connected, acyclic graph with a designated root node. Every node except the root has exactly one parent.
- Root
- The topmost node in a tree, with no parent. All other nodes are descendants of the root.
- Leaf
- A node with no children. Leaves are the terminal nodes of the tree.
- Height
- The number of edges on the longest path from a node down to a leaf. The height of a tree is the height of its root.
- Depth
- The number of edges from the root down to a given node. The root has depth 0.
- Binary Tree
- A tree where each node has at most two children, referred to as the left child and right child.
- Binary Search Tree (BST)
- A binary tree where for every node $x$: all keys in the left subtree are less than $x$'s key, and all keys in the right subtree are greater.
- Traversal
- A systematic way of visiting every node in a tree exactly once. The three classic orderings are inorder, preorder, and postorder.
- Balanced Tree
- A tree where the heights of left and right subtrees differ by at most a constant factor, ensuring $O(\log n)$ operations.
What & Why
A tree is a hierarchical data structure where each element (node) has a parent-child relationship with other elements. Unlike arrays and linked lists, which are linear, trees branch out, making them natural for representing hierarchies: file systems, HTML DOMs, organization charts, and decision processes.
Why does this matter? Trees are the foundation for fast searching, sorting, and hierarchical organization. Binary search trees give O(\log n) lookup when balanced. Traversal algorithms on trees are the gateway to recursive thinking, which is central to CS problem-solving.
Key properties:
- Hierarchical structure: models parent-child relationships naturally
O(\log n)search in balanced BSTs, compared toO(n)for linked lists- Recursive definition: a tree is either empty or a node with subtrees, making recursive algorithms elegant
- Foundation for advanced structures: heaps, tries, B-trees, and red-black trees all build on the basic tree concept
How It Works
Binary Tree Structure
Each node stores a value and pointers to its left and right children. A null pointer indicates the absence of a child.
This is a valid BST: for every node, left descendants are smaller and right descendants are larger. For example, node 3 has children 1 (left, smaller) and 6 (right, larger).
BST Search
To find a key $k$ in a BST, start at the root and at each node:
- If $k$ equals the node's key, return the node
- If $k$ is less, recurse into the left subtree
- If $k$ is greater, recurse into the right subtree
- If the subtree is empty, the key is not in the tree
Each comparison eliminates one subtree, halving the search space in a balanced tree. This gives $O(\log n)$ time, similar to binary search on a sorted array.
Traversal Orders
The three classic depth-first traversals differ only in when they visit the current node relative to its children:
- Inorder (Left, Node, Right): visits BST nodes in sorted order. For the tree above: 1, 3, 6, 8, 10, 14
- Preorder (Node, Left, Right): visits the root before its subtrees. Useful for copying or serializing a tree: 8, 3, 1, 6, 10, 14
- Postorder (Left, Right, Node): visits children before the parent. Useful for deletion or evaluating expression trees: 1, 6, 3, 14, 10, 8
BST Insertion and Deletion
Insertion follows the same path as search: walk down the tree comparing keys until you find an empty spot, then place the new node there.
Deletion has three cases:
- Leaf node: simply remove it
- One child: replace the node with its child
- Two children: replace the node's key with its inorder successor (smallest node in the right subtree), then delete that successor
Complexity Analysis
| Operation | Balanced BST | Worst Case (Skewed) | Notes |
|---|---|---|---|
| Search | $O(\log n)$ | $O(n)$ | Degenerates to linked list if unbalanced |
| Insert | $O(\log n)$ | $O(n)$ | Same path as search |
| Delete | $O(\log n)$ | $O(n)$ | Finding successor adds at most $O(h)$ |
| Inorder traversal | $O(n)$ | $O(n)$ | Visits every node exactly once |
| Space | $O(n)$ | $O(n)$ | One node per element |
Height of a Balanced Binary Tree
A balanced binary tree with $n$ nodes has height:
This is why balanced BST operations are $O(\log n)$: the longest path from root to leaf has $\lfloor \log_2 n \rfloor$ edges.
A completely skewed tree (every node has only one child) has height $n - 1$, degrading all operations to $O(n)$. Self-balancing trees like AVL trees and red-black trees prevent this by maintaining balance invariants after every insertion and deletion.
Implementation
BST Node and Core Operations (Pseudocode)
STRUCTURE TreeNode:
value
left ← NULL
right ← NULL
FUNCTION insert(root, val):
IF root = NULL THEN RETURN new TreeNode(val)
IF val < root.value THEN
root.left ← insert(root.left, val)
ELSE IF val > root.value THEN
root.right ← insert(root.right, val)
RETURN root
FUNCTION search(root, val):
IF root = NULL THEN RETURN false
IF val = root.value THEN RETURN true
IF val < root.value THEN RETURN search(root.left, val)
RETURN search(root.right, val)
Traversals (Pseudocode)
FUNCTION inorder(root, result):
IF root = NULL THEN RETURN
inorder(root.left, result)
APPEND root.value TO result
inorder(root.right, result)
FUNCTION preorder(root, result):
IF root = NULL THEN RETURN
APPEND root.value TO result
preorder(root.left, result)
preorder(root.right, result)
FUNCTION postorder(root, result):
IF root = NULL THEN RETURN
postorder(root.left, result)
postorder(root.right, result)
APPEND root.value TO result
BST Deletion (Pseudocode)
FUNCTION delete(root, val):
IF root = NULL THEN RETURN NULL
IF val < root.value THEN
root.left ← delete(root.left, val)
ELSE IF val > root.value THEN
root.right ← delete(root.right, val)
ELSE
// Case 1: leaf node
IF root.left = NULL AND root.right = NULL THEN RETURN NULL
// Case 2: one child
IF root.left = NULL THEN RETURN root.right
IF root.right = NULL THEN RETURN root.left
// Case 3: two children
successor ← findMin(root.right)
root.value ← successor.value
root.right ← delete(root.right, successor.value)
RETURN root
FUNCTION findMin(node):
WHILE node.left ≠ NULL:
node ← node.left
RETURN node
Real-World Applications
- File systems: directories and files form a tree; every path from root to file is unique
- HTML/DOM: the browser parses HTML into a tree of elements, enabling CSS selectors and JavaScript DOM traversal
- Database indexes: B-trees and B+ trees (balanced multi-way trees) are the standard for on-disk database indexes
- Compilers: abstract syntax trees (ASTs) represent parsed source code as a tree of expressions and statements
- Decision trees: machine learning models that split data at each node based on feature thresholds
- Git: commits form a directed acyclic graph, but the repository's file structure at each commit is a tree (tree objects)
Key Takeaways
- Trees model hierarchical relationships and enable $O(\log n)$ search when balanced
- A BST maintains the invariant: left subtree keys < node key < right subtree keys
- Inorder traversal of a BST yields sorted output, making BSTs a natural choice for ordered data
- Unbalanced BSTs degrade to $O(n)$ operations; self-balancing variants (AVL, red-black) solve this
- Traversal order (inorder, preorder, postorder) determines the visit sequence and suits different use cases: sorting, copying, and cleanup respectively