Back to Blog

Virtual Memory: Page Tables, TLBs, and Page Faults

How operating systems use virtual memory to give each process its own address space, translate virtual addresses to physical ones, and handle memory that exceeds physical RAM.

2021-03-08
Share
Computer Engineeringosvirtual-memory

Terminology

  • Virtual memory: an abstraction that gives each process the illusion of having its own large, contiguous address space, independent of the actual physical memory available
  • Virtual address: an address generated by the CPU during program execution; it must be translated to a physical address before accessing actual memory
  • Physical address: the actual address in the hardware RAM chips; the memory controller uses this address to read or write data
  • Page: a fixed-size block of virtual memory, typically 4 KB on most systems; the basic unit of memory management
  • Frame (page frame): a fixed-size block of physical memory, the same size as a page; pages are mapped to frames
  • Page table: a data structure maintained by the OS for each process that maps virtual page numbers to physical frame numbers
  • Page Table Entry (PTE): a single entry in the page table containing the physical frame number and metadata bits (valid, dirty, accessed, permissions)
  • TLB (Translation Lookaside Buffer): a small, fast hardware cache that stores recent virtual-to-physical address translations to avoid accessing the page table on every memory reference
  • Page fault: an exception raised by the hardware when a program accesses a virtual page that is not currently mapped to a physical frame; the OS handles it by loading the page from disk or allocating a new frame
  • Demand paging: a strategy where pages are loaded into physical memory only when they are first accessed, rather than loading the entire program at startup
  • Swap space: a region on disk (or a swap file) used to store pages that have been evicted from physical memory to make room for other pages
  • Thrashing: a condition where the system spends more time swapping pages in and out of memory than executing useful work, caused by insufficient physical memory for the active working sets
  • Working set: the set of pages a process is actively using during a given time window
  • Dirty bit: a flag in a PTE that indicates the page has been modified since it was loaded; dirty pages must be written to disk before eviction
  • Valid bit: a flag in a PTE that indicates whether the page is currently in physical memory
  • Multi-level page table: a hierarchical page table structure that saves memory by only allocating page table entries for address ranges actually in use
  • Inverted page table: a page table indexed by physical frame number rather than virtual page number; uses one entry per physical frame regardless of virtual address space size
  • Memory-mapped I/O: a technique where hardware device registers or files are mapped into the virtual address space, allowing programs to access them using regular load/store instructions
  • Copy-on-write (COW): an optimization where forked processes share physical pages until one of them writes, at which point the OS creates a private copy of the modified page
  • ASID (Address Space Identifier): a tag stored in TLB entries that identifies which process owns the translation, allowing TLB entries from multiple processes to coexist without flushing

What & Why

Every program assumes it has access to a large, contiguous block of memory starting at address zero. In reality, physical memory is shared among many processes, is limited in size, and is not contiguous. Virtual memory bridges this gap by adding a layer of indirection: every address the CPU generates is a virtual address that gets translated to a physical address by the hardware and OS working together.

This indirection provides three critical benefits. First, isolation: each process has its own virtual address space, so it cannot read or write another process's memory, even accidentally. Second, the illusion of abundant memory: a process can use more memory than physically available because the OS can swap inactive pages to disk. Third, flexibility: the OS can place pages anywhere in physical memory, defragment without moving data from the program's perspective, and share common pages (like library code) between processes.

The cost of virtual memory is the translation overhead. Every memory access requires converting a virtual address to a physical one. Without optimization, this would double the memory access time (one access to the page table, one to the actual data). The TLB solves this by caching recent translations, achieving hit rates above 99% for most workloads.

Understanding virtual memory is essential for systems programming. It explains why programs can use more memory than RAM, why accessing memory sometimes triggers disk I/O, why fork() is fast (copy-on-write), and why memory-mapped files are efficient.

How It Works

Address Translation

A virtual address is split into two parts: the virtual page number (VPN) and the page offset. The page offset passes through unchanged (it identifies the byte within the page). The VPN is looked up in the page table to find the corresponding physical frame number (PFN). The physical address is formed by concatenating the PFN with the page offset.

Virtual Address Virtual Page Number (VPN) Page Offset Page Table VPN -> PFN + valid, dirty, perms Physical Address Physical Frame Number (PFN) Page Offset offset passes through unchanged

For a system with 48-bit virtual addresses and 4 KB pages:

$\text{Page offset bits} = \log_2(4096) = 12 \text{ bits}$
$\text{VPN bits} = 48 - 12 = 36 \text{ bits}$
$\text{Number of virtual pages} = 2^{36} \approx 68 \text{ billion}$

A flat page table for this address space would need 2^{36} entries. At 8 bytes per entry, that is 512 GB per process, which is clearly impractical. This is why multi-level page tables exist.

Multi-Level Page Tables

A multi-level page table splits the VPN into multiple indices, each indexing into a different level of the table. Only the portions of the table that correspond to actually used memory are allocated.

x86-64 uses a 4-level page table (with 48-bit virtual addresses):

  • Level 4 (PML4): 9 bits, 512 entries, each pointing to a Level 3 table
  • Level 3 (PDPT): 9 bits, 512 entries, each pointing to a Level 2 table
  • Level 2 (PD): 9 bits, 512 entries, each pointing to a Level 1 table
  • Level 1 (PT): 9 bits, 512 entries, each containing the physical frame number
  • Offset: 12 bits

A process that uses only a small portion of its address space needs only a few page table pages at each level, rather than the full 2^{36} entries.

The TLB

The TLB is a small (typically 64-1024 entries), fully associative cache that stores recent VPN-to-PFN translations. On every memory access:

  1. The CPU extracts the VPN from the virtual address
  2. The TLB is checked for a matching entry (TLB hit)
  3. If found, the PFN is used immediately (no page table walk)
  4. If not found (TLB miss), the hardware or OS walks the page table, installs the translation in the TLB, and retries

TLB hit rates typically exceed 99% because of locality: a program's working set usually fits in far fewer pages than the TLB can hold.

Page Faults

A page fault occurs when the valid bit in the PTE is 0, meaning the page is not in physical memory. The hardware raises an exception, and the OS page fault handler:

  1. Determines the cause: is the page on disk (swap), is it a new allocation (zero-fill), or is it an invalid access (segmentation fault)?
  2. If the page is on disk, selects a victim frame to evict (using a replacement algorithm like clock or LRU approximation)
  3. If the victim frame is dirty, writes it to disk first
  4. Reads the requested page from disk into the freed frame
  5. Updates the page table entry with the new frame number and sets the valid bit
  6. Restarts the faulting instruction

Page faults are expensive: a disk read takes millions of CPU cycles. This is why the OS tries to keep the working set in memory and why thrashing (constant page faults) is catastrophic for performance.

Memory-Mapped I/O and Files

Memory-mapped I/O maps device registers into the virtual address space. Instead of using special I/O instructions, the CPU reads and writes device registers using regular load/store instructions to specific virtual addresses. The page table entries for these addresses point to device memory rather than RAM.

Memory-mapped files (mmap) map a file's contents into the virtual address space. Reading from the mapped region triggers page faults that load file data on demand. Writes to the mapped region are eventually flushed to the file. This avoids explicit read/write system calls and lets the OS manage file caching through the same page replacement mechanisms used for regular memory.

Complexity Analysis

Operation TLB Hit TLB Miss Page Fault
Translation time $\sim 1$ cycle $\sim 10$-$100$ cycles $\sim 10^6$-$10^7$ cycles
Page table accesses $0$ $L$ (number of levels) $L$ + disk I/O
Typical frequency $> 99\%$ $< 1\%$ $< 0.001\%$

Effective memory access time with TLB:

$T_{\text{eff}} = h \times (t_{\text{TLB}} + t_{\text{mem}}) + (1 - h) \times (t_{\text{TLB}} + L \times t_{\text{mem}} + t_{\text{mem}})$

Where $h$ is the TLB hit rate, $t_{\text{TLB}}$ is the TLB lookup time, $t_{\text{mem}}$ is the memory access time, and $L$ is the number of page table levels.

Simplified (assuming $t_{\text{TLB}}$ is overlapped with cache access):

$T_{\text{eff}} = h \times t_{\text{mem}} + (1 - h) \times (L + 1) \times t_{\text{mem}}$

For $h = 0.99$, $L = 4$, $t_{\text{mem}} = 100$ ns:

$T_{\text{eff}} = 0.99 \times 100 + 0.01 \times 5 \times 100 = 99 + 5 = 104 \text{ ns}$

Only a 4% overhead despite the 4-level page table, thanks to the high TLB hit rate.

Page table memory overhead for a multi-level table with $L$ levels, each with $2^k$ entries of size $s$ bytes:

$\text{Worst case} = L \times 2^k \times s \text{ bytes per level path}$
$\text{Best case (sparse)} \approx L \times s \times \lceil \frac{\text{used pages}}{2^k} \rceil$

Implementation

ALGORITHM TranslateVirtualAddress(virtualAddr, pageTable, TLB)
INPUT: virtualAddr: virtual address, pageTable: multi-level page table, TLB: translation cache
OUTPUT: physical address or PAGE_FAULT exception
BEGIN
  PAGE_SIZE <- 4096
  OFFSET_BITS <- 12

  vpn <- virtualAddr / PAGE_SIZE
  offset <- virtualAddr MOD PAGE_SIZE

  // Step 1: Check TLB
  tlbEntry <- TLB_LOOKUP(TLB, vpn)
  IF tlbEntry is not NULL AND tlbEntry.valid THEN
    // TLB hit
    pfn <- tlbEntry.pfn
    CHECK_PERMISSIONS(tlbEntry, currentAccessType)
    RETURN pfn * PAGE_SIZE + offset
  END IF

  // Step 2: TLB miss, walk the page table
  pte <- WalkPageTable(virtualAddr, pageTable)

  IF pte is NULL OR pte.valid = false THEN
    RAISE PAGE_FAULT(virtualAddr)
  END IF

  CHECK_PERMISSIONS(pte, currentAccessType)

  // Step 3: Install translation in TLB
  TLB_INSERT(TLB, vpn, pte.pfn, pte.permissions)

  pfn <- pte.pfn
  RETURN pfn * PAGE_SIZE + offset
END

ALGORITHM WalkPageTable(virtualAddr, pageTable)
INPUT: virtualAddr: 48-bit virtual address, pageTable: 4-level page table
OUTPUT: page table entry or NULL
BEGIN
  // Extract indices for each level (9 bits each)
  pml4Index <- (virtualAddr >> 39) AND 0x1FF
  pdptIndex <- (virtualAddr >> 30) AND 0x1FF
  pdIndex   <- (virtualAddr >> 21) AND 0x1FF
  ptIndex   <- (virtualAddr >> 12) AND 0x1FF

  // Level 4: PML4
  pml4Entry <- pageTable.pml4[pml4Index]
  IF pml4Entry.present = false THEN RETURN NULL END IF

  // Level 3: PDPT
  pdpt <- LOAD_TABLE(pml4Entry.address)
  pdptEntry <- pdpt[pdptIndex]
  IF pdptEntry.present = false THEN RETURN NULL END IF

  // Level 2: Page Directory
  pd <- LOAD_TABLE(pdptEntry.address)
  pdEntry <- pd[pdIndex]
  IF pdEntry.present = false THEN RETURN NULL END IF

  // Level 1: Page Table
  pt <- LOAD_TABLE(pdEntry.address)
  ptEntry <- pt[ptIndex]

  RETURN ptEntry
END

ALGORITHM HandlePageFault(virtualAddr, process)
INPUT: virtualAddr: faulting address, process: the faulting process
OUTPUT: page loaded into memory, page table updated
BEGIN
  vpn <- virtualAddr / PAGE_SIZE

  // Determine fault type
  mapping <- FIND_VMA(process.memoryMap, virtualAddr)
  IF mapping is NULL THEN
    SIGNAL(process, SEGFAULT)
    RETURN
  END IF

  // Allocate a physical frame
  frame <- ALLOCATE_FRAME()
  IF frame is NULL THEN
    // No free frames: must evict
    frame <- SELECT_VICTIM_FRAME()
    victimPTE <- GET_PTE_FOR_FRAME(frame)

    IF victimPTE.dirty THEN
      WRITE_PAGE_TO_SWAP(frame, victimPTE.swapLocation)
    END IF

    victimPTE.valid <- false
    victimPTE.swapLocation <- SWAP_OFFSET(frame)
    INVALIDATE_TLB_ENTRY(victimPTE.vpn)
  END IF

  // Load the page
  IF mapping.type = FILE_BACKED THEN
    READ_PAGE_FROM_FILE(mapping.file, mapping.fileOffset + vpn * PAGE_SIZE, frame)
  ELSE IF mapping.type = SWAP THEN
    READ_PAGE_FROM_SWAP(process.pte[vpn].swapLocation, frame)
  ELSE
    // Anonymous page (heap, stack): zero-fill
    ZERO_FILL(frame)
  END IF

  // Update page table
  pte <- GET_PTE(process.pageTable, virtualAddr)
  pte.pfn <- frame.number
  pte.valid <- true
  pte.dirty <- false
  pte.accessed <- true
  pte.permissions <- mapping.permissions

  // Restart the faulting instruction
  RETURN
END

ALGORITHM ClockPageReplacement(frameList, clockHand)
INPUT: frameList: circular list of physical frames, clockHand: current position
OUTPUT: victim frame to evict
BEGIN
  WHILE true DO
    frame <- frameList[clockHand]

    IF frame.accessed = false THEN
      // Found victim: this frame has not been accessed recently
      RETURN frame
    END IF

    // Give second chance: clear accessed bit
    frame.accessed <- false
    clockHand <- (clockHand + 1) MOD LENGTH(frameList)
  END WHILE
END

ALGORITHM CopyOnWriteHandler(virtualAddr, process)
INPUT: virtualAddr: address that triggered a write fault on a COW page, process: faulting process
OUTPUT: private copy of the page for this process
BEGIN
  vpn <- virtualAddr / PAGE_SIZE
  pte <- GET_PTE(process.pageTable, virtualAddr)
  oldFrame <- pte.pfn

  refCount <- GET_REF_COUNT(oldFrame)

  IF refCount = 1 THEN
    // Last reference: just make it writable
    pte.permissions <- READ_WRITE
    pte.dirty <- false
    INVALIDATE_TLB_ENTRY(vpn)
    RETURN
  END IF

  // Multiple references: copy the page
  newFrame <- ALLOCATE_FRAME()
  COPY_FRAME(oldFrame, newFrame)
  DECREMENT_REF_COUNT(oldFrame)

  pte.pfn <- newFrame.number
  pte.permissions <- READ_WRITE
  pte.dirty <- false
  INVALIDATE_TLB_ENTRY(vpn)
END

Real-World Applications

  • Process isolation: every modern OS uses virtual memory to prevent processes from accessing each other's memory; a crash in one application cannot corrupt another's data
  • Memory-mapped files: databases like SQLite and LMDB use mmap to map database files into memory, letting the OS handle caching and avoiding explicit read/write system calls
  • Copy-on-write fork: when a process calls fork(), the OS shares all physical pages between parent and child using COW; this makes fork() nearly instantaneous even for processes with gigabytes of memory
  • Shared libraries: dynamic libraries (.so, .dll) are loaded once into physical memory and mapped into the virtual address space of every process that uses them, saving significant memory
  • Overcommit and lazy allocation: Linux allows processes to allocate more virtual memory than physical RAM (overcommit); pages are only backed by physical frames when first accessed, enabling efficient memory use for sparse allocations
  • Containers and virtualization: virtual machines use nested page tables (EPT on Intel, NPT on AMD) where the hypervisor manages a second level of address translation from guest-physical to host-physical addresses
  • Garbage collectors: language runtimes like the JVM use virtual memory features (guard pages, mprotect) to implement efficient garbage collection barriers and detect stack overflows

Key Takeaways

  • Virtual memory provides each process with an isolated address space by translating virtual addresses to physical addresses through page tables maintained by the OS
  • Multi-level page tables (4 levels on x86-64) avoid the prohibitive memory cost of flat page tables by only allocating entries for address ranges actually in use
  • The TLB caches recent translations and achieves hit rates above 99%, reducing the effective translation overhead to near zero for most workloads
  • Page faults are handled by the OS and cost millions of cycles when they require disk I/O; demand paging loads pages only when accessed, and page replacement algorithms (clock, LRU approximation) choose which pages to evict
  • Copy-on-write allows fork() to be fast by sharing physical pages between parent and child until a write occurs, at which point only the modified page is copied
  • Memory-mapped I/O and files unify device access and file I/O with regular memory operations, simplifying programming and enabling the OS to manage caching transparently