Back to Blog

Trees: Hierarchical Data and Binary Search Trees

Understanding tree structures, binary trees, BSTs, and the three classic traversal orders that unlock recursive thinking.

2021-01-11
Share
CS Fundamentalstreesbinary-treesdata-structures

Terminology

Tree
A connected, acyclic graph with a designated root node. Every node except the root has exactly one parent.
Root
The topmost node in a tree, with no parent. All other nodes are descendants of the root.
Leaf
A node with no children. Leaves are the terminal nodes of the tree.
Height
The number of edges on the longest path from a node down to a leaf. The height of a tree is the height of its root.
Depth
The number of edges from the root down to a given node. The root has depth 0.
Binary Tree
A tree where each node has at most two children, referred to as the left child and right child.
Binary Search Tree (BST)
A binary tree where for every node $x$: all keys in the left subtree are less than $x$'s key, and all keys in the right subtree are greater.
Traversal
A systematic way of visiting every node in a tree exactly once. The three classic orderings are inorder, preorder, and postorder.
Balanced Tree
A tree where the heights of left and right subtrees differ by at most a constant factor, ensuring $O(\log n)$ operations.

What & Why

A tree is a hierarchical data structure where each element (node) has a parent-child relationship with other elements. Unlike arrays and linked lists, which are linear, trees branch out, making them natural for representing hierarchies: file systems, HTML DOMs, organization charts, and decision processes.

Why does this matter? Trees are the foundation for fast searching, sorting, and hierarchical organization. Binary search trees give O(\log n) lookup when balanced. Traversal algorithms on trees are the gateway to recursive thinking, which is central to CS problem-solving.

Key properties:

  • Hierarchical structure: models parent-child relationships naturally
  • O(\log n) search in balanced BSTs, compared to O(n) for linked lists
  • Recursive definition: a tree is either empty or a node with subtrees, making recursive algorithms elegant
  • Foundation for advanced structures: heaps, tries, B-trees, and red-black trees all build on the basic tree concept

How It Works

Binary Tree Structure

Each node stores a value and pointers to its left and right children. A null pointer indicates the absence of a child.

8 3 10 1 6 14 root

This is a valid BST: for every node, left descendants are smaller and right descendants are larger. For example, node 3 has children 1 (left, smaller) and 6 (right, larger).

BST Search

To find a key $k$ in a BST, start at the root and at each node:

  1. If $k$ equals the node's key, return the node
  2. If $k$ is less, recurse into the left subtree
  3. If $k$ is greater, recurse into the right subtree
  4. If the subtree is empty, the key is not in the tree

Each comparison eliminates one subtree, halving the search space in a balanced tree. This gives $O(\log n)$ time, similar to binary search on a sorted array.

Traversal Orders

The three classic depth-first traversals differ only in when they visit the current node relative to its children:

  • Inorder (Left, Node, Right): visits BST nodes in sorted order. For the tree above: 1, 3, 6, 8, 10, 14
  • Preorder (Node, Left, Right): visits the root before its subtrees. Useful for copying or serializing a tree: 8, 3, 1, 6, 10, 14
  • Postorder (Left, Right, Node): visits children before the parent. Useful for deletion or evaluating expression trees: 1, 6, 3, 14, 10, 8

BST Insertion and Deletion

Insertion follows the same path as search: walk down the tree comparing keys until you find an empty spot, then place the new node there.

Deletion has three cases:

  1. Leaf node: simply remove it
  2. One child: replace the node with its child
  3. Two children: replace the node's key with its inorder successor (smallest node in the right subtree), then delete that successor

Complexity Analysis

Operation Balanced BST Worst Case (Skewed) Notes
Search $O(\log n)$ $O(n)$ Degenerates to linked list if unbalanced
Insert $O(\log n)$ $O(n)$ Same path as search
Delete $O(\log n)$ $O(n)$ Finding successor adds at most $O(h)$
Inorder traversal $O(n)$ $O(n)$ Visits every node exactly once
Space $O(n)$ $O(n)$ One node per element

Height of a Balanced Binary Tree

A balanced binary tree with $n$ nodes has height:

$h = \lfloor \log_2 n \rfloor$

This is why balanced BST operations are $O(\log n)$: the longest path from root to leaf has $\lfloor \log_2 n \rfloor$ edges.

A completely skewed tree (every node has only one child) has height $n - 1$, degrading all operations to $O(n)$. Self-balancing trees like AVL trees and red-black trees prevent this by maintaining balance invariants after every insertion and deletion.

Implementation

BST Node and Core Operations (Pseudocode)

STRUCTURE TreeNode:
    value
    left ← NULL
    right ← NULL

FUNCTION insert(root, val):
    IF root = NULL THEN RETURN new TreeNode(val)
    IF val < root.value THEN
        root.left ← insert(root.left, val)
    ELSE IF val > root.value THEN
        root.right ← insert(root.right, val)
    RETURN root

FUNCTION search(root, val):
    IF root = NULL THEN RETURN false
    IF val = root.value THEN RETURN true
    IF val < root.value THEN RETURN search(root.left, val)
    RETURN search(root.right, val)

Traversals (Pseudocode)

FUNCTION inorder(root, result):
    IF root = NULL THEN RETURN
    inorder(root.left, result)
    APPEND root.value TO result
    inorder(root.right, result)

FUNCTION preorder(root, result):
    IF root = NULL THEN RETURN
    APPEND root.value TO result
    preorder(root.left, result)
    preorder(root.right, result)

FUNCTION postorder(root, result):
    IF root = NULL THEN RETURN
    postorder(root.left, result)
    postorder(root.right, result)
    APPEND root.value TO result

BST Deletion (Pseudocode)

FUNCTION delete(root, val):
    IF root = NULL THEN RETURN NULL
    IF val < root.value THEN
        root.left ← delete(root.left, val)
    ELSE IF val > root.value THEN
        root.right ← delete(root.right, val)
    ELSE
        // Case 1: leaf node
        IF root.left = NULL AND root.right = NULL THEN RETURN NULL
        // Case 2: one child
        IF root.left = NULL THEN RETURN root.right
        IF root.right = NULL THEN RETURN root.left
        // Case 3: two children
        successor ← findMin(root.right)
        root.value ← successor.value
        root.right ← delete(root.right, successor.value)
    RETURN root

FUNCTION findMin(node):
    WHILE node.left ≠ NULL:
        node ← node.left
    RETURN node

Real-World Applications

  • File systems: directories and files form a tree; every path from root to file is unique
  • HTML/DOM: the browser parses HTML into a tree of elements, enabling CSS selectors and JavaScript DOM traversal
  • Database indexes: B-trees and B+ trees (balanced multi-way trees) are the standard for on-disk database indexes
  • Compilers: abstract syntax trees (ASTs) represent parsed source code as a tree of expressions and statements
  • Decision trees: machine learning models that split data at each node based on feature thresholds
  • Git: commits form a directed acyclic graph, but the repository's file structure at each commit is a tree (tree objects)

Key Takeaways

  • Trees model hierarchical relationships and enable $O(\log n)$ search when balanced
  • A BST maintains the invariant: left subtree keys < node key < right subtree keys
  • Inorder traversal of a BST yields sorted output, making BSTs a natural choice for ordered data
  • Unbalanced BSTs degrade to $O(n)$ operations; self-balancing variants (AVL, red-black) solve this
  • Traversal order (inorder, preorder, postorder) determines the visit sequence and suits different use cases: sorting, copying, and cleanup respectively