Notes/Parsing and Syntax Trees: Where Compilers and NLP Share Roots

Parsing and Syntax Trees: Where Compilers and NLP Share Roots

From constituency parsing to dependency parsing and the CYK algorithm, exploring how machines discover the grammatical structure of sentences using the same ideas that power programming language compilers.

2025-10-13AI-Synthesized from Personal Notes

Computational LinguisticsParsingSyntax TreesNlp

Terminology

Term	Definition
Parsing	The process of analyzing a string of symbols (words or tokens) according to a formal grammar to produce a structured representation
Constituency Parsing	Breaking a sentence into nested phrases (noun phrase, verb phrase, etc.) organized as a tree where leaves are words and internal nodes are phrase labels
Dependency Parsing	Identifying directed relationships between words where each word depends on exactly one head word, forming a tree rooted at the main verb
Context-Free Grammar (CFG)	A formal grammar with rules of the form $A \rightarrow \alpha$ where $A$ is a non-terminal and $\alpha$ is a sequence of terminals and non-terminals
Chomsky Normal Form (CNF)	A restricted CFG form where every rule is either $A \rightarrow BC$ (two non-terminals) or $A \rightarrow a$ (one terminal), required by the CYK algorithm
CYK Algorithm	Cocke-Younger-Kasami: a dynamic programming algorithm that parses a string under a CNF grammar in $O(n^3 \cdot \|G\|)$ time
PCFG	Probabilistic Context-Free Grammar: a CFG where each rule has an associated probability, enabling the parser to select the most likely parse tree
Parse Tree	A tree structure representing the syntactic analysis of a sentence, with the root as the start symbol and leaves as words
Ambiguity	When a sentence has multiple valid parse trees under the grammar, e.g., "I saw the man with the telescope" has two readings

What & Why

Parsing reveals the hidden structure of sentences. "The cat sat on the mat" is not just a flat sequence of words; it has a hierarchical structure where "the cat" forms a noun phrase, "on the mat" forms a prepositional phrase, and "sat on the mat" forms a verb phrase. This structure determines meaning.

The connection to compilers is direct. Programming language compilers parse source code into abstract syntax trees (ASTs) using context-free grammars, then traverse those trees to generate machine code. NLP parsers do the same thing with natural language, except natural language is ambiguous (one sentence can have multiple valid parse trees) while programming languages are designed to be unambiguous.

Understanding parsing matters because it is the bridge between surface text and structured meaning. Information extraction, question answering, and machine translation all benefit from knowing which words modify which. And the algorithms (CYK, Earley, shift-reduce) are shared between compiler construction and computational linguistics, making this one of the deepest connections between programming and language.

How It Works

Constituency Parsing

A constituency parse tree groups words into nested phrases. Given the grammar:

S -> NP VP
NP -> DET NN
VP -> VBD PP
PP -> IN NP
DET -> "the"
NN -> "cat" | "mat"
VBD -> "sat"
IN -> "on"

The sentence "the cat sat on the mat" produces:

Dependency Parsing

Instead of phrase structure, dependency parsing identifies head-modifier relationships. Each word has exactly one head (except the root). "sat" is the root; "cat" depends on "sat" as its subject; "mat" depends on "on" as its object.

Dependency trees are more compact and directly encode grammatical relations (subject, object, modifier), making them popular for information extraction.

The CYK Algorithm

CYK is a bottom-up dynamic programming parser for grammars in Chomsky Normal Form. It fills a triangular table where cell $[i][j]$ contains all non-terminals that can derive the substring from position $i$ to $j$. It tries every possible split point $k$ between $i$ and $j$, checking if any rule $A \rightarrow BC$ matches where $B$ is in cell $[i][k]$ and $C$ is in cell $[k+1][j]$.

Complexity Analysis

Operation	Time	Space	Notes
CYK parsing	$O(n^3 \cdot \|G\|)$	$O(n^2 \cdot \|N\|)$	$n$ = sentence length, $\|G\|$ = grammar rules, $\|N\|$ = non-terminals
Earley parsing	$O(n^3)$ worst, $O(n^2)$ unambiguous	$O(n^2)$	Handles any CFG, not just CNF
Shift-reduce (dependency)	$O(n)$	$O(n)$	Linear time with greedy or beam search decisions
CNF conversion	$O(\|G\|)$	$O(\|G\|)$	One-time preprocessing of the grammar

For a typical English sentence of length $n = 25$ with a grammar of $|G| = 10{,}000$ rules, CYK performs roughly $25^3 \times 10{,}000 \approx 1.5 \times 10^8$ operations. This is feasible but motivates the use of pruning and beam search in practice.

Implementation

ALGORITHM CYK(words, grammar)
INPUT: words: list of n tokens
       grammar: set of CNF rules, each rule is (A -> B C) or (A -> a) with probability
OUTPUT: parseTable: 2D table where parseTable[i][j] = set of non-terminals deriving words[i..j]
        bestTree: most probable parse tree

BEGIN
  n <- LENGTH(words)
  table <- CREATE 2D array [n x n], each cell is an empty map of {nonTerminal -> (prob, backpointer)}

  // Fill diagonal: lexical rules (A -> a)
  FOR i FROM 0 TO n - 1 DO
    FOR EACH rule (A -> words[i]) WITH probability p IN grammar DO
      table[i][i][A] <- (p, leaf: words[i])
    END FOR
  END FOR

  // Fill upper triangle: binary rules (A -> B C)
  FOR span FROM 2 TO n DO
    FOR i FROM 0 TO n - span DO
      j <- i + span - 1
      FOR k FROM i TO j - 1 DO
        FOR EACH rule (A -> B C) WITH probability p IN grammar DO
          IF B IN table[i][k] AND C IN table[k+1][j] THEN
            totalProb <- p * table[i][k][B].prob * table[k+1][j][C].prob
            IF A NOT IN table[i][j] OR totalProb > table[i][j][A].prob THEN
              table[i][j][A] <- (totalProb, split: k, left: B, right: C)
            END IF
          END IF
        END FOR
      END FOR
    END FOR
  END FOR

  // Check if start symbol S spans the entire sentence
  IF "S" IN table[0][n-1] THEN
    bestTree <- RECONSTRUCT tree from backpointers starting at table[0][n-1]["S"]
    RETURN table, bestTree
  ELSE
    RETURN table, NO_PARSE
  END IF
END

ALGORITHM ShiftReduceDependencyParse(words, classifier)
INPUT: words: list of n tokens, classifier: trained model that predicts actions
OUTPUT: arcs: set of (head, dependent, relation) triples

BEGIN
  stack <- [ROOT]
  buffer <- COPY of words
  arcs <- empty set

  WHILE buffer is not empty OR LENGTH(stack) > 1 DO
    features <- EXTRACT features from top of stack and front of buffer
    action <- classifier.PREDICT(features)

    IF action = SHIFT AND buffer is not empty THEN
      PUSH buffer[0] onto stack
      REMOVE buffer[0]
    ELSE IF action = LEFT_ARC AND LENGTH(stack) >= 2 THEN
      // stack[-2] depends on stack[-1]
      dependent <- stack[-2]
      head <- stack[-1]
      ADD (head, dependent, action.relation) TO arcs
      REMOVE stack[-2]
    ELSE IF action = RIGHT_ARC AND LENGTH(stack) >= 2 THEN
      // stack[-1] depends on stack[-2]
      dependent <- stack[-1]
      head <- stack[-2]
      ADD (head, dependent, action.relation) TO arcs
      POP stack
    END IF
  END WHILE

  RETURN arcs
END

Real-World Applications

Compilers and interpreters: Programming language parsers use CFGs (often LL or LR variants) to build ASTs from source code, the same theoretical foundation as NLP constituency parsing
Question answering: Dependency parses identify the subject and object of a question, helping systems locate the answer span in a passage
Grammar checkers: Tools like Grammarly use parsing to detect structural errors (dangling modifiers, subject-verb disagreement) that surface-level rules miss
Information extraction: Dependency paths between entities in a sentence serve as features for relation extraction ("Obama" -nsubj-> "visited" -dobj-> "Paris" implies a visit relation)
Machine translation: Syntax-based MT systems reorder parse tree nodes to handle structural differences between languages
Sentiment analysis: Recursive neural networks operate on parse trees to compose phrase-level sentiment from word-level sentiment, handling negation and intensifiers

Key Takeaways

Constituency parsing groups words into nested phrases (NP, VP, PP) forming a tree, while dependency parsing identifies head-modifier relationships between individual words
The CYK algorithm parses sentences under a CNF grammar in $O(n^3 \cdot |G|)$ time using dynamic programming, filling a triangular table bottom-up
Natural language is inherently ambiguous: "I saw the man with the telescope" has two valid parse trees, and resolving ambiguity requires probabilistic grammars (PCFGs) or learned models
Compilers and NLP parsers share the same theoretical foundation (context-free grammars, parse trees), making parsing one of the deepest connections between programming languages and natural language
Modern dependency parsers use neural shift-reduce models that run in linear time, making them fast enough for real-time applications

2025-10-14

Semantic Similarity and Sentence Embeddings: From Cosine Distance to RAG Pipelines

How BERT embeddings and sentence transformers measure meaning similarity between texts, enabling semantic search, duplicate detection, and retrieval-augmented generation.

2025-10-15

Attention and Self-Attention Through the Linguistics Lens

How the attention mechanism lets transformers learn which words relate to which, why positional encoding replaces word order, and what self-attention looks like from a linguist's perspective.

2025-10-16

Machine Translation: From Encoder-Decoder to Why Google Translate Got Good Overnight

How sequence-to-sequence models, attention, beam search, and the BLEU score transformed machine translation from a rule-based curiosity into a system that handles 100+ languages.