Notes/SQL vs NoSQL vs NewSQL: Choosing the Right Database Paradigm

SQL vs NoSQL vs NewSQL: Choosing the Right Database Paradigm

Understanding the relational model, document/key-value/graph/column-family stores, ACID vs BASE guarantees, and how to pick the right database for your workload.

2025-10-01AI-Synthesized from Personal Notes

Database InternalsSqlNosqlNewsqlDatabases

Terminology

Term	Definition	Trade-off / Gotcha
Relational Model	A data model that organizes information into tables (relations) with rows and columns, enforcing structure through a fixed schema.	Schema changes require migrations, which can lock tables on large datasets.
ACID	Atomicity, Consistency, Isolation, Durability: the four guarantees that relational transactions provide.	Stronger guarantees mean higher coordination cost and lower write throughput.
BASE	Basically Available, Soft state, Eventually consistent: a relaxed consistency model favored by many NoSQL systems.	Reads may return stale data; application must handle conflicts.
CAP Theorem	Brewer's theorem stating a distributed system can guarantee at most two of Consistency, Availability, and Partition tolerance simultaneously.	Not a menu: partitions always happen, so the real choice is C vs A in failures.
Sharding	Splitting data across multiple machines by a partition key so no single node holds the entire dataset.	Cross-shard queries and joins become expensive or impossible.
Schema-on-Write	Enforcing a fixed schema when data is written; rejects malformed records. Used by SQL databases.	Rigid: every schema change requires a coordinated migration.
Schema-on-Read	Storing data without enforcing structure at write time; the application interprets structure at read time. Used by document stores.	No safety net for malformed data; bugs surface at read time.
Denormalization	Duplicating data across records to avoid joins, trading storage for read performance.	Update anomalies: changing one fact requires updating every copy.
Linearizability	The strongest single-object consistency model: every read returns the most recent write, as if operations execute in a single global order.	Requires consensus coordination, adding latency to every write.
NewSQL	A class of databases that combine the SQL interface and ACID guarantees of relational systems with the horizontal scalability of NoSQL.	Younger ecosystem, higher ops complexity, consensus overhead on writes.

What & Why

Every application stores data, and the database you choose shapes everything: your schema, your query patterns, your scaling strategy, and your failure modes. For decades, relational databases (SQL) were the default. Then the internet scaled beyond what a single machine could handle, and NoSQL emerged as an alternative that traded consistency and structure for horizontal scalability and flexibility. More recently, NewSQL systems appeared, promising the best of both worlds: SQL semantics with distributed scalability.

The choice between SQL, NoSQL, and NewSQL is not about which is "better." Each paradigm makes different trade-offs along three axes: data model flexibility, consistency guarantees, and scalability. Understanding these trade-offs is the difference between a system that scales gracefully and one that collapses under load or drowns in data inconsistencies.

This post breaks down the three paradigms, compares their internal architectures, and gives you a practical framework for choosing the right one.

How It Works

The Three Paradigms at a Glance

How Queries Flow Through Each Database Type

The internal architecture of each paradigm determines how a query travels from the application to the data and back.

In a SQL database, the query goes through a parser, optimizer, and index lookup on a single machine. In NoSQL, a router determines which shard holds the data and forwards the request. In NewSQL, the query is parsed like SQL but the planner coordinates across distributed nodes using consensus protocols.

ACID vs BASE: The Consistency Spectrum

ACID and BASE represent two ends of a consistency spectrum. ACID guarantees that every transaction is all-or-nothing, isolated from concurrent transactions, and durable once committed. BASE relaxes these guarantees: the system is always available, state may be temporarily inconsistent, but it will converge to consistency eventually.

The NoSQL Data Model Zoo

NoSQL is not a single model. It is an umbrella term covering four distinct data models, each optimized for different access patterns.

Row Storage vs Column Storage vs Document Nesting

The physical layout of data on disk determines which operations are fast and which are slow. Relational databases typically use row-oriented storage, column-family stores use column-oriented storage, and document stores nest related data inside a single record.

Row storage reads entire rows at once, making it fast for transactional workloads that access all columns of a record. Column storage reads only the columns needed, making it fast for analytical queries that aggregate a single column across millions of rows. Document nesting eliminates joins by embedding related data inside a single record, making it fast for reads that always fetch the full entity.

The CAP Trade-off in Practice

The CAP theorem states that in the presence of a network partition, a distributed database must choose between consistency (every read sees the latest write) and availability (every request gets a response). No system avoids this trade-off entirely, but different databases make different choices.

SQL databases traditionally sacrifice partition tolerance: they run on a single node and simply do not partition. NoSQL databases sacrifice consistency: they remain available during partitions but may serve stale reads. NewSQL databases attempt to provide all three by using consensus protocols (Raft, Paxos) to maintain consistency across partitions, at the cost of higher write latency during network issues.

Evolution of Database Paradigms

Deciding Which Database to Pick

The right database depends on your workload characteristics, consistency requirements, and scaling needs. Here is a practical decision framework:

Complexity Analysis

The performance characteristics of each database type vary significantly depending on the operation. Below is a comparison of typical read, write, and scan complexities, where $n$ is the total number of records and $k$ is the number of results returned.

Operation	SQL (B-Tree Index)	Key-Value Store	Document Store	Column-Family
Point read (by PK)	$O(\log n)$	$O(1)$	$O(\log n)$	$O(\log n)$
Range scan	$O(\log n + k)$	$O(n)$ (no range)	$O(\log n + k)$	$O(\log n + k)$
Write (insert)	$O(\log n)$	$O(1)$	$O(\log n)$	$O(\log n)$ amortized via LSM
Join (two tables)	$O(n \cdot m)$ worst, $O(n \log m)$ with index	N/A	N/A	N/A
Full table scan	$O(n)$	$O(n)$	$O(n)$	$O(n)$ per column family

Key-value stores achieve $O(1)$ point reads and writes by using hash-based partitioning, but sacrifice range queries entirely. SQL databases use B-tree indexes that give $O(\log n)$ for both point and range operations. Column-family stores use LSM trees that batch writes for high throughput but may have read amplification during compaction.

$\text{Read Amplification (LSM)} = O(L) \text{ where } L = \text{number of levels}$

$\text{Write Amplification (B-Tree)} = O(\log_B n) \text{ where } B = \text{branching factor}$

For NewSQL systems, point reads and writes have the same asymptotic complexity as SQL ($O(\log n)$) but add a constant factor for distributed consensus (typically one network round-trip for Raft leader writes).

Implementation

Pseudocode: Choosing a Database for a Given Workload

ALGORITHM ChooseDatabase(workload)
INPUT: workload with properties:
  - needsACID: boolean
  - needsJoins: boolean
  - scaleRequirement: "single-node" | "horizontal"
  - accessPattern: "point-lookup" | "range-scan" | "graph-traversal" | "wide-row-append"
  - schemaStability: "fixed" | "evolving"
OUTPUT: recommended database paradigm

BEGIN
  IF workload.needsACID AND workload.needsJoins THEN
    IF workload.scaleRequirement = "horizontal" THEN
      RETURN "NewSQL (CockroachDB, Spanner, TiDB)"
    ELSE
      RETURN "SQL (PostgreSQL, MySQL)"
    END IF

  ELSE IF workload.accessPattern = "point-lookup" THEN
    IF workload.schemaStability = "fixed" THEN
      RETURN "Key-Value (Redis, DynamoDB)"
    ELSE
      RETURN "Document (MongoDB, Firestore)"
    END IF

  ELSE IF workload.accessPattern = "graph-traversal" THEN
    RETURN "Graph (Neo4j, Neptune)"

  ELSE IF workload.accessPattern = "wide-row-append" THEN
    RETURN "Column-Family (Cassandra, HBase)"

  ELSE IF workload.accessPattern = "range-scan" THEN
    IF workload.needsACID THEN
      RETURN "SQL or NewSQL"
    ELSE
      RETURN "Column-Family or Document"
    END IF

  ELSE
    RETURN "SQL (safe default for unknown workloads)"
  END IF
END

Pseudocode: Simplified ACID Transaction Execution

ALGORITHM ExecuteACIDTransaction(operations)
INPUT: operations: list of read/write operations
OUTPUT: COMMIT or ABORT

BEGIN
  txn_id = GenerateTransactionID()
  WriteToWAL(txn_id, "BEGIN")

  FOR EACH op IN operations DO
    IF op.type = "WRITE" THEN
      AcquireLock(op.key, EXCLUSIVE)
      WriteToWAL(txn_id, op)
      ApplyToBuffer(op)
    ELSE IF op.type = "READ" THEN
      AcquireLock(op.key, SHARED)
      result = ReadFromBufferOrDisk(op.key)
    END IF

    IF LockConflictDetected() THEN
      WriteToWAL(txn_id, "ABORT")
      ReleaseAllLocks(txn_id)
      UndoBufferChanges(txn_id)
      RETURN ABORT
    END IF
  END FOR

  WriteToWAL(txn_id, "COMMIT")
  FlushWALToDisk()
  ReleaseAllLocks(txn_id)
  RETURN COMMIT
END

Pseudocode: Consistent Hashing for NoSQL Shard Routing

ALGORITHM RouteToShard(key, ring)
INPUT: key: the record's partition key
       ring: sorted list of (hash_position, node_id) pairs
OUTPUT: node_id responsible for this key

BEGIN
  key_hash = Hash(key) MOD RING_SIZE

  FOR EACH (position, node) IN ring DO
    IF key_hash <= position THEN
      RETURN node
    END IF
  END FOR

  // Wrap around: key_hash is past the last node
  RETURN ring[0].node
END

Real-World Applications

SQL databases power the majority of business applications: banking systems (where ACID is non-negotiable), e-commerce order management, ERP systems, and any domain where data integrity and complex queries matter more than raw throughput.

Key-value stores like Redis serve as caching layers, session stores, and rate limiters where sub-millisecond latency on simple lookups is critical. DynamoDB powers serverless applications at Amazon scale with single-digit millisecond reads.

Document stores like MongoDB are popular for content management systems, user profiles, and product catalogs where the schema evolves frequently and related data is naturally nested.

Column-family stores like Cassandra handle time-series data (IoT sensor readings, application metrics), messaging systems (Discord uses Cassandra for message storage), and any workload with high write throughput and predictable read patterns.

Graph databases like Neo4j power social network features (friend-of-friend queries), recommendation engines, fraud detection (finding suspicious transaction rings), and knowledge graphs.

NewSQL databases like CockroachDB and Google Spanner serve applications that need both ACID guarantees and global distribution: multi-region financial platforms, global inventory systems, and any workload that has outgrown a single PostgreSQL instance but cannot sacrifice transactional correctness.

Key Takeaways

SQL databases provide ACID transactions, powerful joins, and decades of optimization, but traditionally scale vertically on a single node.
NoSQL is not one thing: it encompasses key-value, document, column-family, and graph models, each optimized for different access patterns.
The ACID vs BASE trade-off is really about how much consistency you are willing to sacrifice for availability and partition tolerance.
NewSQL systems combine SQL semantics with horizontal scalability using distributed consensus, but add latency for coordination.
The CAP theorem is not a menu: in practice, every distributed database must handle partitions, so the real choice is between consistency and availability during failures.
Choose your database based on your access patterns, consistency requirements, and scaling needs, not based on hype or defaults.

2025-10-02

TEST2:B-Tree Indexes Deep Dive: Why Your WHERE Clause Is Fast

How B-trees store sorted data on disk, how page splits work, what fill factor controls, and the difference between clustered and non-clustered indexes.

2025-10-02

TEST:B-Tree Indexes Deep Dive: Why Your WHERE Clause Is Fast

How B-trees store sorted data on disk, how page splits work, what fill factor controls, and the difference between clustered and non-clustered indexes.

2025-10-02

B-Tree Indexes Deep Dive: Why Your WHERE Clause Is Fast

How B-trees store sorted data on disk, how page splits work, what fill factor controls, and the difference between clustered and non-clustered indexes.

SQL vs NoSQL vs NewSQL: Choosing the Right Database Paradigm

Understanding the relational model, document/key-value/graph/column-family stores, ACID vs BASE guarantees, and how to pick the right database for your workload.

Terminology

What & Why

How It Works

The Three Paradigms at a Glance

How Queries Flow Through Each Database Type

ACID vs BASE: The Consistency Spectrum

The NoSQL Data Model Zoo

Row Storage vs Column Storage vs Document Nesting

The CAP Trade-off in Practice

Evolution of Database Paradigms

Deciding Which Database to Pick

Complexity Analysis

Implementation

Pseudocode: Choosing a Database for a Given Workload

Pseudocode: Simplified ACID Transaction Execution

Pseudocode: Consistent Hashing for NoSQL Shard Routing

Real-World Applications

Key Takeaways

Read More