Ditching ANTLR: How PostHog Rebuilt Its SQL Parser for a 70x Speedup
Replacing a parser generator with a hand-rolled recursive-descent parser using AI agents, differential testing, and compiler design.
For infrastructure and developer tool builders, parser generators like ANTLR are a double-edged sword. They let you define a grammar declaratively and generate a working parser in minutes. But that convenience comes with a steep performance tax. PostHog, which allows users to query their data directly using SQL, hit this wall. To decouple logical queries from physical database layouts and enforce access controls, PostHog transpiles user-facing SQL into raw ClickHouse SQL. This means every single query must pass through a parser first.
For years, PostHog relied on a C++ parser generated by ANTLR. While functional, the parser was a major bottleneck. Recently, PostHog engineer Robbie Coomber bypassed the traditional trade-offs of parser design, rewriting the parser into a hand-rolled recursive-descent engine that runs 70 times faster. The twist? He did it using parallel AI agent sessions, writing 16,000 lines of parser code and 5,000 lines of custom testing tooling while barely looking at the underlying implementation.
This project offers a blueprint for how developers can use LLM agents not just for boilerplate, but for high-performance systems engineering.
Why Parser Generators Struggle with Speed
To understand why PostHog achieved a 70x speedup, you have to look at how ANTLR operates under the hood. ANTLR is incredibly flexible because it supports LL(*) grammars, allowing it to handle highly ambiguous syntax by looking ahead an arbitrary number of tokens.
To achieve this, ANTLR compiles your declarative grammar (defined in a .g4 file) into an Augmented Transition Network (ATN), which is essentially a non-deterministic finite automaton (NFA) with a stack. At runtime, ANTLR uses a generic interpreter to walk this graph. Every token visited requires traversing this abstract graph, introducing heavy layers of indirection and dynamic memory allocation.
Furthermore, when ANTLR encounters ambiguous paths, it simulates multiple interpretations in lockstep until only one remains valid. While highly optimized, a graph-walking interpreter can never compete with a hand-rolled recursive-descent parser. In a hand-rolled parser, grammar rules are mapped directly to native function calls (like parse_expression()), and lookahead is hardcoded only where strictly necessary.
The AI Strategy: Parallel Architectures and the Oracle
Writing a 16,000-line recursive-descent parser by hand is a massive, error-prone undertaking that most engineering teams cannot justify. Coomber solved this by using Claude Code in parallel sessions to explore two different architectural paths simultaneously:
- A performance-first approach: A classic recursive-descent parser paired with a Pratt expression loop, using backtracking and lookahead only when necessary.
- A compatibility-first approach: An engine that mirrored ANTLR's state transitions but implemented them as explicit, compiled code paths rather than generic graph traversals.
Surprisingly, both approaches yielded similar performance gains. But the real breakthrough was not the code generation itself; it was the testing strategy.
To make an AI-generated parser production-ready, you need an absolute source of truth. Coomber used the existing C++ ANTLR parser as an "oracle." By adopting a differential testing workflow, the goal became simple: make the new parser agree with the oracle on every possible input. Whenever the two parsers disagreed on a query, the discrepancy was fed back to the AI agent to patch the parser.
The Developer Angle: Building the Differential Test Harness
If you want to replicate this success in your own tooling, the lesson is clear: your AI is only as good as your test harness. You cannot rely on manual test cases or basic unit tests to validate a parser that handles untrusted user input.
Coomber used Hypothesis, a Python-based property-based testing (PBT) library, to find edge cases. Instead of writing static SQL queries, Hypothesis generates random inputs based on defined rules to find cases where a specific property fails. In this case, the property was simple: new_parser(sql) == oracle_parser(sql).
To feed Hypothesis, Coomber and Claude built a custom tool to parse PostHog's .g4 grammar file and automatically generate valid SQL queries. They then added mutation steps, swapping tokens and inserting random parentheses to stress-test the parser.
For developers looking to adopt this workflow, here is how to structure your refactoring pipeline:
- Isolate the Interface: Ensure your legacy component has a clean, isolated input-output boundary. This is your oracle.
- Generate the Input Space: If you have a schema or grammar, write a generator. If you are refactoring an API, use your production access logs to replay real-world traffic.
- Run Differential Loops: Set up a script that runs both implementations in parallel, diffs the output, and automatically formats the failure cases into prompts for your AI agent.
- Handle the Edge Cases: Expect the AI to struggle with complex backtracking. When the agent hits a wall, you must step in to simplify the grammar or manually guide the lookahead logic.
The Trade-offs of Hand-Rolled Code
While a 70x speedup is a massive win, it comes with architectural trade-offs that teams must weigh carefully.
The original ANTLR parser was defined in a single, highly readable .g4 grammar file. If PostHog needs to add a new SQL feature, updating the grammar file is relatively straightforward. With the new hand-rolled parser, they now maintain 16,000 lines of complex, low-level parser code.
This shift increases the maintenance burden. Any future grammar changes will require running the same AI-assisted pipeline to regenerate the corresponding code paths, or developers will have to manually modify the recursive-descent logic. For PostHog, where SQL parsing sits directly on the critical path of every user dashboard and analytics query, the performance gain easily justifies the maintenance overhead. For projects where parsing is not a bottleneck, the declarative simplicity of ANTLR remains the better choice.
Moving Beyond Parser Generators
PostHog's experiment proves that AI agents are changing the economics of systems engineering. Tasks that once required months of meticulous compiler design can now be executed in days, provided you build the right validation guardrails. By combining LLM code generation with property-based differential testing, developers can aggressively optimize critical bottlenecks without sacrificing correctness.
Sources & further reading
- I rewrote PostHog's SQL parser, 70x faster, while barely looking at the code — posthog.com
- How We Improved Our SQL Parser Speed by 70x | Bytebase — bytebase.com
- Raw SQL queries support · Issue #3548 · PostHog/posthog — github.com
- Hacker News — hacker-news.penportal.net
Lenn writes about cloud platforms, Kubernetes internals, and the infrastructure decisions that quietly make or break engineering organizations. Based in Berlin's vibrant tech scene, they have a talent for turning dense platform-engineering topics into prose that people actually finish reading.
Discussion 1
need to see the benchmarks on different hardware