Notes/Trees: Hierarchical Data and Binary Search Trees
Back to Notes

Trees: Hierarchical Data and Binary Search Trees

Understanding tree structures, binary trees, BSTs, and the three classic traversal orders that unlock recursive thinking.

2021-01-11AI-Synthesized from Personal NotesSource1200+ words of raw notesEnrichmentsCode blocks, GraphicsPipelineMulti-pass AI review · Score: 97/100
Share
CS FundamentalsTreesBinary TreesData Structures

Terminology

Tree
A connected, acyclic graph with a designated root node. Every node except the root has exactly one parent.
Root
The topmost node in a tree, with no parent. All other nodes are descendants of the root.
Leaf
A node with no children. Leaves are the terminal nodes of the tree.
Height
The number of edges on the longest path from a node down to a leaf. The height of a tree is the height of its root.
Depth
The number of edges from the root down to a given node. The root has depth 0.
Binary Tree
A tree where each node has at most two children, referred to as the left child and right child.
Binary Search Tree (BST)
A binary tree where for every node $x$: all keys in the left subtree are less than $x$'s key, and all keys in the right subtree are greater.
Traversal
A systematic way of visiting every node in a tree exactly once. The three classic orderings are inorder, preorder, and postorder.
Balanced Tree
A tree where the heights of left and right subtrees differ by at most a constant factor, ensuring $O(\log n)$ operations.

What & Why

A tree is a hierarchical data structure where each element (node) has a parent-child relationship with other elements. Unlike arrays and linked lists, which are linear, trees branch out, making them natural for representing hierarchies: file systems, HTML DOMs, organization charts, and decision processes.

Why does this matter? Trees are the foundation for fast searching, sorting, and hierarchical organization. Binary search trees give $O(\log n)$ lookup when balanced. Traversal algorithms on trees are the gateway to recursive thinking, which is central to CS problem-solving.

Key properties:

  • Hierarchical structure: models parent-child relationships naturally
  • $O(\log n)$ search in balanced BSTs, compared to $O(n)$ for linked lists
  • Recursive definition: a tree is either empty or a node with subtrees, making recursive algorithms elegant
  • Foundation for advanced structures: heaps, tries, B-trees, and red-black trees all build on the basic tree concept

How It Works

Binary Tree Structure

Each node stores a value and pointers to its left and right children. A null pointer indicates the absence of a child.

8 3 10 1 6 14 root

This is a valid BST: for every node, left descendants are smaller and right descendants are larger. For example, node 3 has children 1 (left, smaller) and 6 (right, larger).

BST Search

To find a key $k$ in a BST, start at the root and at each node:

  1. If $k$ equals the node's key, return the node
  2. If $k$ is less, recurse into the left subtree
  3. If $k$ is greater, recurse into the right subtree
  4. If the subtree is empty, the key is not in the tree

Each comparison eliminates one subtree, halving the search space in a balanced tree. This gives $O(\log n)$ time, similar to binary search on a sorted array.

Traversal Orders

The three classic depth-first traversals differ only in when they visit the current node relative to its children:

  • Inorder (Left, Node, Right): visits BST nodes in sorted order. For the tree above: 1, 3, 6, 8, 10, 14
  • Preorder (Node, Left, Right): visits the root before its subtrees. Useful for copying or serializing a tree: 8, 3, 1, 6, 10, 14
  • Postorder (Left, Right, Node): visits children before the parent. Useful for deletion or evaluating expression trees: 1, 6, 3, 14, 10, 8

BST Insertion and Deletion

Insertion follows the same path as search: walk down the tree comparing keys until you find an empty spot, then place the new node there.

Deletion has three cases:

  1. Leaf node: simply remove it
  2. One child: replace the node with its child
  3. Two children: replace the node's key with its inorder successor (smallest node in the right subtree), then delete that successor

Complexity Analysis

Operation Balanced BST Worst Case (Skewed) Notes
Search $O(\log n)$ $O(n)$ Degenerates to linked list if unbalanced
Insert $O(\log n)$ $O(n)$ Same path as search
Delete $O(\log n)$ $O(n)$ Finding successor adds at most $O(h)$
Inorder traversal $O(n)$ $O(n)$ Visits every node exactly once
Space $O(n)$ $O(n)$ One node per element

Height of a Balanced Binary Tree

A balanced binary tree with $n$ nodes has height:

$h = \lfloor \log_2 n \rfloor$

This is why balanced BST operations are $O(\log n)$: the longest path from root to leaf has $\lfloor \log_2 n \rfloor$ edges.

A completely skewed tree (every node has only one child) has height $n - 1$, degrading all operations to $O(n)$. Self-balancing trees like AVL trees and red-black trees prevent this by maintaining balance invariants after every insertion and deletion.

Implementation

BST Node and Core Operations (Pseudocode)

STRUCTURE TreeNode:
    value
    left ← NULL
    right ← NULL

FUNCTION insert(root, val):
    IF root = NULL THEN RETURN new TreeNode(val)
    IF val < root.value THEN
        root.left ← insert(root.left, val)
    ELSE IF val > root.value THEN
        root.right ← insert(root.right, val)
    RETURN root

FUNCTION search(root, val):
    IF root = NULL THEN RETURN false
    IF val = root.value THEN RETURN true
    IF val < root.value THEN RETURN search(root.left, val)
    RETURN search(root.right, val)

Traversals (Pseudocode)

FUNCTION inorder(root, result):
    IF root = NULL THEN RETURN
    inorder(root.left, result)
    APPEND root.value TO result
    inorder(root.right, result)

FUNCTION preorder(root, result):
    IF root = NULL THEN RETURN
    APPEND root.value TO result
    preorder(root.left, result)
    preorder(root.right, result)

FUNCTION postorder(root, result):
    IF root = NULL THEN RETURN
    postorder(root.left, result)
    postorder(root.right, result)
    APPEND root.value TO result

BST Deletion (Pseudocode)

FUNCTION delete(root, val):
    IF root = NULL THEN RETURN NULL
    IF val < root.value THEN
        root.left ← delete(root.left, val)
    ELSE IF val > root.value THEN
        root.right ← delete(root.right, val)
    ELSE
        // Case 1: leaf node
        IF root.left = NULL AND root.right = NULL THEN RETURN NULL
        // Case 2: one child
        IF root.left = NULL THEN RETURN root.right
        IF root.right = NULL THEN RETURN root.left
        // Case 3: two children
        successor ← findMin(root.right)
        root.value ← successor.value
        root.right ← delete(root.right, successor.value)
    RETURN root

FUNCTION findMin(node):
    WHILE node.left ≠ NULL:
        node ← node.left
    RETURN node

Real-World Applications

  • File systems: directories and files form a tree; every path from root to file is unique
  • HTML/DOM: the browser parses HTML into a tree of elements, enabling CSS selectors and JavaScript DOM traversal
  • Database indexes: B-trees and B+ trees (balanced multi-way trees) are the standard for on-disk database indexes
  • Compilers: abstract syntax trees (ASTs) represent parsed source code as a tree of expressions and statements
  • Decision trees: machine learning models that split data at each node based on feature thresholds
  • Git: commits form a directed acyclic graph, but the repository's file structure at each commit is a tree (tree objects)

Key Takeaways

  • Trees model hierarchical relationships and enable $O(\log n)$ search when balanced
  • A BST maintains the invariant: left subtree keys < node key < right subtree keys
  • Inorder traversal of a BST yields sorted output, making BSTs a natural choice for ordered data
  • Unbalanced BSTs degrade to $O(n)$ operations; self-balancing variants (AVL, red-black) solve this
  • Traversal order (inorder, preorder, postorder) determines the visit sequence and suits different use cases: sorting, copying, and cleanup respectively