Notes/Trees: Hierarchical Data and Binary Search Trees

Trees: Hierarchical Data and Binary Search Trees

Understanding tree structures, binary trees, BSTs, and the three classic traversal orders that unlock recursive thinking.

2021-01-11AI-Synthesized from Personal Notes

CS FundamentalsTreesBinary TreesData Structures

Terminology

Tree: A connected, acyclic graph with a designated root node. Every node except the root has exactly one parent.
Root: The topmost node in a tree, with no parent. All other nodes are descendants of the root.
Leaf: A node with no children. Leaves are the terminal nodes of the tree.
Height: The number of edges on the longest path from a node down to a leaf. The height of a tree is the height of its root.
Depth: The number of edges from the root down to a given node. The root has depth 0.
Binary Tree: A tree where each node has at most two children, referred to as the left child and right child.
Binary Search Tree (BST): A binary tree where for every node $x$: all keys in the left subtree are less than $x$'s key, and all keys in the right subtree are greater.
Traversal: A systematic way of visiting every node in a tree exactly once. The three classic orderings are inorder, preorder, and postorder.
Balanced Tree: A tree where the heights of left and right subtrees differ by at most a constant factor, ensuring $O(\log n)$ operations.

What & Why

A tree is a hierarchical data structure where each element (node) has a parent-child relationship with other elements. Unlike arrays and linked lists, which are linear, trees branch out, making them natural for representing hierarchies: file systems, HTML DOMs, organization charts, and decision processes.

Why does this matter? Trees are the foundation for fast searching, sorting, and hierarchical organization. Binary search trees give $O(\log n)$ lookup when balanced. Traversal algorithms on trees are the gateway to recursive thinking, which is central to CS problem-solving.

Key properties:

Hierarchical structure: models parent-child relationships naturally
$O(\log n)$ search in balanced BSTs, compared to $O(n)$ for linked lists
Recursive definition: a tree is either empty or a node with subtrees, making recursive algorithms elegant
Foundation for advanced structures: heaps, tries, B-trees, and red-black trees all build on the basic tree concept

How It Works

Binary Tree Structure

Each node stores a value and pointers to its left and right children. A null pointer indicates the absence of a child.

This is a valid BST: for every node, left descendants are smaller and right descendants are larger. For example, node 3 has children 1 (left, smaller) and 6 (right, larger).

BST Search

To find a key $k$ in a BST, start at the root and at each node:

If $k$ equals the node's key, return the node
If $k$ is less, recurse into the left subtree
If $k$ is greater, recurse into the right subtree
If the subtree is empty, the key is not in the tree

Each comparison eliminates one subtree, halving the search space in a balanced tree. This gives $O(\log n)$ time, similar to binary search on a sorted array.

Traversal Orders

The three classic depth-first traversals differ only in when they visit the current node relative to its children:

Inorder (Left, Node, Right): visits BST nodes in sorted order. For the tree above: 1, 3, 6, 8, 10, 14
Preorder (Node, Left, Right): visits the root before its subtrees. Useful for copying or serializing a tree: 8, 3, 1, 6, 10, 14
Postorder (Left, Right, Node): visits children before the parent. Useful for deletion or evaluating expression trees: 1, 6, 3, 14, 10, 8

BST Insertion and Deletion

Insertion follows the same path as search: walk down the tree comparing keys until you find an empty spot, then place the new node there.

Deletion has three cases:

Leaf node: simply remove it
One child: replace the node with its child
Two children: replace the node's key with its inorder successor (smallest node in the right subtree), then delete that successor

Complexity Analysis

Operation	Balanced BST	Worst Case (Skewed)	Notes
Search	$O(\log n)$	$O(n)$	Degenerates to linked list if unbalanced
Insert	$O(\log n)$	$O(n)$	Same path as search
Delete	$O(\log n)$	$O(n)$	Finding successor adds at most $O(h)$
Inorder traversal	$O(n)$	$O(n)$	Visits every node exactly once
Space	$O(n)$	$O(n)$	One node per element

Height of a Balanced Binary Tree

A balanced binary tree with $n$ nodes has height:

$h = \lfloor \log_2 n \rfloor$

This is why balanced BST operations are $O(\log n)$: the longest path from root to leaf has $\lfloor \log_2 n \rfloor$ edges.

A completely skewed tree (every node has only one child) has height $n - 1$, degrading all operations to $O(n)$. Self-balancing trees like AVL trees and red-black trees prevent this by maintaining balance invariants after every insertion and deletion.

Implementation

BST Node and Core Operations (Pseudocode)

STRUCTURE TreeNode:
    value
    left ← NULL
    right ← NULL

FUNCTION insert(root, val):
    IF root = NULL THEN RETURN new TreeNode(val)
    IF val < root.value THEN
        root.left ← insert(root.left, val)
    ELSE IF val > root.value THEN
        root.right ← insert(root.right, val)
    RETURN root

FUNCTION search(root, val):
    IF root = NULL THEN RETURN false
    IF val = root.value THEN RETURN true
    IF val < root.value THEN RETURN search(root.left, val)
    RETURN search(root.right, val)

Traversals (Pseudocode)

FUNCTION inorder(root, result):
    IF root = NULL THEN RETURN
    inorder(root.left, result)
    APPEND root.value TO result
    inorder(root.right, result)

FUNCTION preorder(root, result):
    IF root = NULL THEN RETURN
    APPEND root.value TO result
    preorder(root.left, result)
    preorder(root.right, result)

FUNCTION postorder(root, result):
    IF root = NULL THEN RETURN
    postorder(root.left, result)
    postorder(root.right, result)
    APPEND root.value TO result

BST Deletion (Pseudocode)

FUNCTION delete(root, val):
    IF root = NULL THEN RETURN NULL
    IF val < root.value THEN
        root.left ← delete(root.left, val)
    ELSE IF val > root.value THEN
        root.right ← delete(root.right, val)
    ELSE
        // Case 1: leaf node
        IF root.left = NULL AND root.right = NULL THEN RETURN NULL
        // Case 2: one child
        IF root.left = NULL THEN RETURN root.right
        IF root.right = NULL THEN RETURN root.left
        // Case 3: two children
        successor ← findMin(root.right)
        root.value ← successor.value
        root.right ← delete(root.right, successor.value)
    RETURN root

FUNCTION findMin(node):
    WHILE node.left ≠ NULL:
        node ← node.left
    RETURN node

Real-World Applications

File systems: directories and files form a tree; every path from root to file is unique
HTML/DOM: the browser parses HTML into a tree of elements, enabling CSS selectors and JavaScript DOM traversal
Database indexes: B-trees and B+ trees (balanced multi-way trees) are the standard for on-disk database indexes
Compilers: abstract syntax trees (ASTs) represent parsed source code as a tree of expressions and statements
Decision trees: machine learning models that split data at each node based on feature thresholds
Git: commits form a directed acyclic graph, but the repository's file structure at each commit is a tree (tree objects)

Key Takeaways

Trees model hierarchical relationships and enable $O(\log n)$ search when balanced
A BST maintains the invariant: left subtree keys < node key < right subtree keys
Inorder traversal of a BST yields sorted output, making BSTs a natural choice for ordered data
Unbalanced BSTs degrade to $O(n)$ operations; self-balancing variants (AVL, red-black) solve this
Traversal order (inorder, preorder, postorder) determines the visit sequence and suits different use cases: sorting, copying, and cleanup respectively

2021-01-13

Heaps and Priority Queues: Always Know the Extreme

How binary heaps maintain the min or max element at the top in O(log n) time, powering priority queues, heap sort, and scheduling systems.

2021-01-15

Graphs: Representation, BFS, and DFS

How to represent graphs with adjacency lists and matrices, and how breadth-first and depth-first search explore them systematically.

2021-01-17

Recursion: Solving Problems by Solving Smaller Problems

How recursive functions work, why they need base cases, and how to think about call stacks, recurrence relations, and memoization.