Back to Blog

Arrays: The Foundation of Everything

Understanding contiguous memory, indexing, and dynamic arrays: the building block behind nearly every data structure.

2021-01-01
Share
CS Fundamentalsarraysdata-structures

What & Why

An array is the simplest and most fundamental data structure in computer science: a contiguous block of memory that stores elements of the same type at fixed-size intervals. Every element lives right next to its neighbor in memory, which means the CPU can jump to any element instantly using simple arithmetic.

Why does this matter? Arrays are the backbone of nearly every other data structure. Stacks, queues, heaps, hash tables, and even strings are typically implemented on top of arrays. Understanding how arrays work at the memory level gives you intuition for performance trade-offs across all of CS.

Key properties:

  • Fixed-size allocation: a static array has a set capacity at creation time
  • O(1) random access: any element is reachable by index in constant time
  • Cache-friendly: contiguous memory means sequential reads hit the CPU cache
  • Costly insertions/deletions: shifting elements takes O(n) time in the worst case

How It Works

Memory Layout

When you allocate an array of $n$ elements of type $T$, the system reserves a contiguous block of $n \times \text{sizeof}(T)$ bytes. Each element's address is computed directly from the base address:

$$\text{addr}(i) = \text{base} + i \times \text{sizeof}(T)$$

This is why indexing is $O(1)$: it's just one multiplication and one addition, regardless of the array size.

arr[0] arr[1] arr[2] arr[3] arr[4] arr[5] 0x00 0x04 0x08 0x0C 0x10 0x14 Contiguous memory (4-byte integers, base = 0x00)

Dynamic Arrays

Static arrays have a fixed capacity, but most real-world programs need arrays that grow. A dynamic array (like JavaScript's Array, Python's list, or Java's ArrayList) solves this by maintaining:

  1. An internal fixed-size buffer (the backing array)
  2. A size counter tracking how many elements are actually stored
  3. A capacity representing the total allocated slots

When size reaches capacity, the dynamic array doubles its capacity: it allocates a new buffer of $2 \times \text{capacity}$, copies all elements over, and frees the old buffer.

Before push (size=4, capacity=4): FULL 10 20 30 40 ↓ push(50) triggers resize ↓ After push (size=5, capacity=8) 10 20 30 40 50

The doubling strategy is what gives push its amortized $O(1)$ cost. Most pushes are $O(1)$ (just write to the next slot), and the occasional $O(n)$ resize is spread across all previous cheap pushes.

Complexity Analysis

Operation Time Space Notes
Access by index $O(1)$ $O(1)$ Direct address calculation
Search (unsorted) $O(n)$ $O(1)$ Must scan every element
Search (sorted) $O(\log n)$ $O(1)$ Binary search
Insert at end $O(1)$ amortized $O(n)$ Resize doubles capacity
Insert at position $i$ $O(n)$ $O(1)$ Shift elements right
Delete at position $i$ $O(n)$ $O(1)$ Shift elements left
Delete at end $O(1)$ $O(1)$ Just decrement size

Amortized Analysis of Dynamic Array Push

Using the aggregate method: after $n$ pushes starting from capacity 1, the total number of element copies due to resizing is:

$$\sum_{k=0}^{\lfloor \log_2 n \rfloor} 2^k = 2^{\lfloor \log_2 n \rfloor + 1} - 1 \lt 2n$$

So the total work for $n$ pushes is at most $n + 2n = 3n$, giving an amortized cost of:

$$\frac{3n}{n} = O(1) \text{ per push}$$

Space Overhead

A dynamic array with $n$ elements and doubling strategy uses at most $2n$ slots of memory. The space utilization is always at least 50% after the first resize:

$$\frac{n}{\text{capacity}} \geq \frac{1}{2}$$

Real-World Applications

  • Database row storage: Fixed-size records stored in contiguous pages for fast sequential scans (e.g., PostgreSQL heap pages)
  • Image processing: Pixel buffers are 2D arrays; contiguous layout enables SIMD vectorization
  • Network buffers: Ring buffers (circular arrays) manage incoming packet data in OS kernels
  • JavaScript engines: V8 uses "packed" arrays (dense, contiguous) for performance; falling back to dictionary mode only when arrays become sparse
  • GPU computing: CUDA kernels operate on arrays in global memory; coalesced access patterns depend on contiguous layout
  • Sorting algorithms: Merge sort, quick sort, and heap sort all operate on arrays as their primary data structure

Key Takeaways

  • Arrays provide $O(1)$ random access: no other data structure matches this for indexed lookups
  • Contiguous memory layout makes arrays cache-friendly, which matters more than Big-O in practice for sequential workloads
  • Dynamic arrays achieve amortized $O(1)$ append by doubling capacity on overflow
  • The trade-off: insertion and deletion in the middle cost $O(n)$ due to element shifting
  • Arrays are the foundation for stacks, queues, heaps, hash tables, and most other data structures you'll encounter in this curriculum