Back to Blog

OS Concepts: Processes, Threads, and Scheduling

How operating systems manage concurrent execution through processes, threads, context switching, and scheduling algorithms that balance fairness, throughput, and responsiveness.

2021-03-07
Share
Computer Engineeringosprocessesthreads

Terminology

  • Process: an instance of a running program, including its code, data, heap, stack, open file descriptors, and OS-managed metadata; each process has its own isolated virtual address space
  • Thread: a lightweight unit of execution within a process; threads share the process's address space and resources but each has its own stack, program counter, and register state
  • Kernel: the core of the operating system that manages hardware resources, enforces isolation between processes, and provides system calls for user programs
  • Context switch: the act of saving the state (registers, program counter, stack pointer) of the currently running process or thread and restoring the state of another, allowing the CPU to switch between tasks
  • PCB (Process Control Block): a data structure maintained by the OS for each process, containing the process state, program counter, CPU registers, memory management info, scheduling priority, and I/O status
  • Scheduler: the OS component that decides which process or thread runs next on the CPU
  • Preemption: the ability of the OS to forcibly suspend a running process and give the CPU to another process, typically triggered by a timer interrupt
  • Time quantum (time slice): the maximum amount of CPU time a process is allowed before being preempted by the scheduler
  • Turnaround time: the total time from when a process is submitted to when it completes, including waiting time and execution time
  • Response time: the time from when a process is submitted to when it first begins executing
  • Throughput: the number of processes completed per unit of time
  • CPU burst: a period during which a process is actively using the CPU without performing I/O
  • I/O burst: a period during which a process is waiting for an I/O operation to complete
  • Ready queue: the set of processes that are loaded in memory, ready to execute, and waiting for CPU time
  • Blocking: when a process voluntarily gives up the CPU because it is waiting for an event (I/O completion, lock acquisition, signal)
  • Starvation: a condition where a process is perpetually denied CPU time because higher-priority processes always run first
  • Aging: a technique to prevent starvation by gradually increasing the priority of processes that have been waiting for a long time
  • Race condition: a situation where the outcome of a program depends on the relative timing of thread execution, leading to unpredictable behavior
  • Mutex (mutual exclusion): a synchronization primitive that ensures only one thread can access a critical section at a time
  • Deadlock: a state where two or more processes are each waiting for a resource held by another, and none can proceed
  • Fork: a system call that creates a new process by duplicating the calling process; the new process (child) gets a copy of the parent's address space

What & Why

An operating system must run many programs at the same time on a limited number of CPUs. Even a single-core machine appears to run dozens of programs simultaneously because the OS rapidly switches between them, giving each a small slice of CPU time. This illusion of concurrency is built on two abstractions: processes and threads.

A process provides isolation. Each process has its own virtual address space, so a bug in one program cannot corrupt another's memory. The OS enforces this boundary using hardware support (virtual memory, privilege levels). Processes communicate through explicit mechanisms like pipes, sockets, or shared memory.

A thread provides concurrency within a process. Threads share the same address space, so they can access the same data structures without copying. This makes threads cheaper to create and switch between than processes, but it also means threads must coordinate access to shared data using synchronization primitives like mutexes.

The scheduler decides which thread runs on which CPU core and for how long. Different scheduling algorithms optimize for different goals: interactive systems need low response time, batch systems need high throughput, and real-time systems need guaranteed deadlines. Understanding scheduling helps developers reason about why their programs sometimes run slowly, why latency spikes occur, and how to structure concurrent code for the best performance.

How It Works

Process Lifecycle

A process moves through several states during its lifetime:

New Ready Running Terminated Blocked admitted dispatch exit preempt (time quantum expired) I/O or wait I/O done

New: The process is being created. The OS allocates a PCB, assigns a process ID, and sets up the initial address space.

Ready: The process is loaded in memory and waiting for CPU time. It sits in the ready queue.

Running: The process is actively executing on a CPU core. Only one process per core can be in this state.

Blocked (Waiting): The process is waiting for an event, such as I/O completion, a timer, or a lock. It cannot run until the event occurs.

Terminated: The process has finished execution. The OS reclaims its resources and removes its PCB.

Context Switching

When the scheduler decides to switch from process A to process B, the CPU must:

  1. Save process A's register state (program counter, stack pointer, general-purpose registers) into A's PCB
  2. Save A's memory management state (page table base register)
  3. Load process B's register state from B's PCB
  4. Load B's memory management state
  5. Flush or tag the TLB (Translation Lookaside Buffer) entries for the new address space
  6. Resume execution at B's saved program counter

A context switch typically takes 1-10 microseconds. The direct cost is the time to save and restore state. The indirect cost is cache pollution: process B's data is not in the cache, so it experiences many cache misses until its working set is loaded. This "cache warmup" cost can dominate the direct switching cost.

Scheduling Algorithms

First-Come, First-Served (FCFS): Processes run in the order they arrive. Simple but suffers from the convoy effect: a long CPU-bound process blocks all shorter processes behind it.

Shortest Job First (SJF): The process with the shortest expected CPU burst runs next. Optimal for minimizing average turnaround time, but requires knowing burst lengths in advance (which is usually impossible).

Round Robin (RR): Each process gets a fixed time quantum q. After q expires, the process is preempted and placed at the back of the ready queue. Good for interactive systems because it bounds response time. If q is too large, RR degrades to FCFS. If q is too small, context switch overhead dominates.

Priority Scheduling: Each process has a priority. The highest-priority ready process runs next. Can cause starvation of low-priority processes, which is solved by aging (gradually increasing the priority of waiting processes).

Multilevel Feedback Queue (MLFQ): Uses multiple queues with different priorities and time quanta. New processes start in the highest-priority queue with a short quantum. If a process uses its entire quantum (CPU-bound), it moves to a lower-priority queue with a longer quantum. I/O-bound processes that block before their quantum expires stay in high-priority queues. This automatically adapts to process behavior without requiring advance knowledge.

Threads vs Processes

Threads within the same process share the heap, global variables, file descriptors, and code. Each thread has its own stack and register state. Creating a thread is faster than creating a process because there is no need to duplicate the address space. Context switching between threads in the same process is faster because the memory mapping does not change (no TLB flush needed).

The trade-off is safety. A bug in one thread (like a buffer overflow) can corrupt data used by other threads in the same process. Processes provide stronger isolation at the cost of higher overhead for creation and communication.

Complexity Analysis

Algorithm Scheduling Decision Avg Turnaround Starvation?
FCFS $O(1)$ Poor (convoy effect) No
SJF $O(n)$ or $O(\log n)$ with heap Optimal Yes
Round Robin $O(1)$ Good for interactive No
Priority $O(\log n)$ with heap Depends on priorities Yes (without aging)
MLFQ $O(1)$ Adaptive No (with boost)

Context switch overhead as a fraction of useful work for time quantum $q$ and switch cost $c$:

$\text{CPU utilization} = \frac{q}{q + c}$

For $q = 10$ ms and $c = 0.01$ ms: utilization $= \frac{10}{10.01} \approx 99.9\%$

For $q = 0.1$ ms and $c = 0.01$ ms: utilization $= \frac{0.1}{0.11} \approx 90.9\%$

Round Robin average response time for $n$ equal-length processes each requiring $T$ total CPU time with quantum $q$:

$\text{Response time for process } i = (i - 1) \times q$
$\text{Average response time} = \frac{(n - 1) \times q}{2}$

Amdahl's Law for the speedup of a program with fraction $p$ that can be parallelized across $n$ threads:

$\text{Speedup} = \frac{1}{(1 - p) + \frac{p}{n}}$

Even with infinite threads ($n \to \infty$), the maximum speedup is $\frac{1}{1 - p}$. If only 90% of the work is parallelizable, the maximum speedup is $10\times$.

Implementation

ALGORITHM RoundRobinScheduler(readyQueue, quantum)
INPUT: readyQueue: queue of PCBs, quantum: time slice in ms
OUTPUT: scheduling decisions over time
BEGIN
  currentTime <- 0

  WHILE readyQueue is not empty DO
    process <- DEQUEUE(readyQueue)

    // Restore process state
    LOAD_REGISTERS(process.savedRegisters)
    SET_TIMER(quantum)

    // Run until quantum expires, I/O, or completion
    runResult <- RUN_PROCESS(process)

    // Save process state
    process.savedRegisters <- SAVE_REGISTERS()
    currentTime <- currentTime + runResult.timeUsed

    IF runResult.reason = COMPLETED THEN
      process.state <- TERMINATED
      process.completionTime <- currentTime
      FREE_RESOURCES(process)
    ELSE IF runResult.reason = IO_REQUEST THEN
      process.state <- BLOCKED
      ADD_TO_WAIT_QUEUE(process, runResult.ioDevice)
    ELSE IF runResult.reason = QUANTUM_EXPIRED THEN
      process.state <- READY
      ENQUEUE(readyQueue, process)
    END IF
  END WHILE
END

ALGORITHM MultilevelFeedbackQueue(queues, boostInterval)
INPUT: queues: array of ready queues (index 0 = highest priority), boostInterval: time between priority boosts
OUTPUT: scheduling decisions
BEGIN
  numLevels <- LENGTH(queues)
  quantums <- [8, 16, 32, 64]  // increasing quanta per level
  lastBoost <- 0
  currentTime <- 0

  WHILE ANY_PROCESSES_EXIST() DO
    // Periodic priority boost to prevent starvation
    IF currentTime - lastBoost >= boostInterval THEN
      FOR level <- 1 TO numLevels - 1 DO
        WHILE queues[level] is not empty DO
          proc <- DEQUEUE(queues[level])
          proc.currentLevel <- 0
          ENQUEUE(queues[0], proc)
        END WHILE
      END FOR
      lastBoost <- currentTime
    END IF

    // Find highest-priority non-empty queue
    selectedLevel <- -1
    FOR level <- 0 TO numLevels - 1 DO
      IF queues[level] is not empty THEN
        selectedLevel <- level
        BREAK
      END IF
    END FOR

    IF selectedLevel = -1 THEN
      IDLE_CPU()
      currentTime <- currentTime + 1
      CONTINUE
    END IF

    process <- DEQUEUE(queues[selectedLevel])
    q <- quantums[selectedLevel]

    runResult <- RUN_PROCESS_FOR(process, q)
    currentTime <- currentTime + runResult.timeUsed

    IF runResult.reason = COMPLETED THEN
      process.state <- TERMINATED
    ELSE IF runResult.reason = IO_REQUEST THEN
      // I/O-bound: stays at same or higher priority
      process.state <- BLOCKED
      process.currentLevel <- MAX(0, selectedLevel - 1)
      ADD_TO_WAIT_QUEUE(process)
    ELSE IF runResult.reason = QUANTUM_EXPIRED THEN
      // CPU-bound: demote to lower priority
      newLevel <- MIN(selectedLevel + 1, numLevels - 1)
      process.currentLevel <- newLevel
      process.state <- READY
      ENQUEUE(queues[newLevel], process)
    END IF
  END WHILE
END

ALGORITHM ContextSwitch(currentProcess, nextProcess)
INPUT: currentProcess: currently running PCB, nextProcess: next PCB to run
OUTPUT: CPU now executing nextProcess
BEGIN
  // Save current process state
  currentProcess.registers <- READ_ALL_REGISTERS()
  currentProcess.programCounter <- READ_PC()
  currentProcess.stackPointer <- READ_SP()
  currentProcess.state <- READY

  // Save memory management state
  currentProcess.pageTableBase <- READ_PAGE_TABLE_REGISTER()

  // Restore next process state
  WRITE_PAGE_TABLE_REGISTER(nextProcess.pageTableBase)
  FLUSH_TLB()  // or use ASID-tagged TLB to avoid full flush

  WRITE_ALL_REGISTERS(nextProcess.registers)
  WRITE_SP(nextProcess.stackPointer)
  nextProcess.state <- RUNNING

  // Jump to next process's saved program counter
  JUMP_TO(nextProcess.programCounter)
END

ALGORITHM ForkProcess(parentProcess)
INPUT: parentProcess: the process calling fork()
OUTPUT: (parentReturn, childProcess)
BEGIN
  childProcess <- NEW_PCB()
  childProcess.pid <- ALLOCATE_PID()
  childProcess.parentPid <- parentProcess.pid
  childProcess.state <- READY

  // Copy-on-write: share pages, mark as read-only
  childProcess.pageTable <- COPY_PAGE_TABLE(parentProcess.pageTable)
  FOR EACH page IN childProcess.pageTable DO
    MARK_READ_ONLY(page)
    MARK_READ_ONLY(parentProcess.pageTable[page.virtualAddress])
    INCREMENT_REF_COUNT(page.physicalFrame)
  END FOR

  // Copy register state
  childProcess.registers <- parentProcess.registers
  childProcess.programCounter <- parentProcess.programCounter
  childProcess.stackPointer <- parentProcess.stackPointer

  // Set return values
  parentProcess.registers.returnValue <- childProcess.pid  // parent gets child PID
  childProcess.registers.returnValue <- 0                   // child gets 0

  ADD_TO_READY_QUEUE(childProcess)
  RETURN (childProcess.pid, childProcess)
END

Real-World Applications

  • Web servers: servers like Nginx use an event-driven, single-threaded model per worker process to handle thousands of concurrent connections, while Apache traditionally used one thread per connection; the choice reflects different trade-offs between context switch overhead and programming simplicity
  • Container orchestration: Docker containers are processes isolated using Linux namespaces and cgroups; Kubernetes schedules these containers across cluster nodes using algorithms similar to OS process scheduling
  • Mobile operating systems: Android and iOS use priority-based scheduling to ensure foreground apps (the one the user is interacting with) get more CPU time than background services, keeping the UI responsive
  • Real-time systems: embedded systems in cars, aircraft, and medical devices use real-time schedulers (Rate Monotonic, Earliest Deadline First) that guarantee tasks complete before their deadlines
  • Database connection pools: databases maintain a pool of worker threads to handle queries; the pool size is tuned to match the number of CPU cores to minimize context switching while maximizing throughput
  • Browser rendering: modern browsers use multiple processes (one per tab) for isolation and multiple threads within each process (main thread, compositor thread, worker threads) for responsive rendering

Key Takeaways

  • Processes provide isolation through separate virtual address spaces; threads provide concurrency within a process by sharing the same address space with independent stacks and register state
  • Context switching saves and restores CPU state between processes; the direct cost is microseconds, but the indirect cost of cache pollution can be much larger
  • Round Robin scheduling bounds response time with a time quantum $q$, but too-small quanta waste CPU on context switches: utilization $= \frac{q}{q + c}$
  • Multilevel Feedback Queues adapt to process behavior automatically, giving short quanta to interactive processes and long quanta to CPU-bound batch processes
  • Amdahl's Law limits parallel speedup: with fraction $p$ parallelizable, maximum speedup is $\frac{1}{1-p}$ regardless of thread count
  • Modern systems use copy-on-write for fork(), sharing physical pages between parent and child until one of them writes, avoiding the cost of copying the entire address space