Why We Chose DuckDB: A Technical Decision Story
We started with SQLite and switched to DuckDB. Here's the detailed reasoning, the benchmarks, and what it taught us about picking databases for AI-native apps.
Database decisions are some of the most consequential in software architecture. They're also some of the hardest to change later. When we started building DenchClaw, we chose SQLite. After about three months, we switched to DuckDB. Here's the detailed story of why.
Starting Point: Why SQLite Made Sense#
SQLite was the obvious choice at the start. It's the most deployed database in history. It's been in production for twenty years. It's embedded, zero-dependency, well-understood. If you want a local database that "just works," SQLite is the answer that everyone reaches for and almost no one regrets.
We had a specific use case: a CRM that stores contacts, companies, deals, and activities. All relational data. All queryable. SQLite handles this fine. For basic CRUD operations — insert a contact, update a deal stage, read a contact record — SQLite is plenty fast and completely reliable.
So why did we switch?
The Analytical Query Problem#
The inflection point came when we started building the reporting and analytics features.
A CRM isn't just a data store. It's also an analysis tool. "Which stage of my pipeline has the most deals stuck over 30 days?" is a question a VP of Sales asks every week. "What's my conversion rate from qualified lead to proposal sent, broken down by industry?" is a real query that has real business value.
These are analytical queries. They involve aggregations, GROUP BY, window functions, date arithmetic. SQLite can do these things. But SQLite is a row-store database — it stores data row by row — and analytical queries that scan many rows and read few columns are exactly the workload that row stores handle worst.
Here's what I noticed: when I ran a query that aggregated our deals by stage with some date filtering — maybe 5,000 records — SQLite took around 80ms. That's not terrible. But it's not good either. And when I ran the same query against DuckDB, it took 3ms.
The difference is DuckDB's columnar storage format. For a query that reads two columns (stage, close_date) out of a ten-column table, DuckDB reads 20% of the data that SQLite reads. For analytical workloads, this isn't a small optimization — it's a fundamental architectural advantage.
The PIVOT Problem#
There's a specific pattern we use in DenchClaw's data model that made the choice even clearer: EAV (Entity-Attribute-Value) with PIVOT views.
Because DenchClaw lets users create custom CRM schemas — add your own fields to any object — we needed a flexible storage model. EAV is the standard approach: instead of fixed columns, you store each field value as a separate row with (entity_id, field_name, field_value). Then you PIVOT this into a flat table for querying.
PIVOT operations are expensive. In a traditional row-store database, pivoting 50,000 entries with 20 fields each means reading 1 million rows and reshaping them into 50,000 wide rows. This is a lot of work.
DuckDB has native PIVOT support and handles this with its columnar engine. The same pivot that took 800ms in SQLite with a manual CASE WHEN pivoting takes 12ms in DuckDB's native PIVOT. An order of magnitude faster.
This matters because the PIVOT views are the primary read path for everything in DenchClaw. Every contact lookup, every pipeline view, every filter — it goes through a PIVOT view. If that view is slow, the whole application feels slow.
The Full-Text Search Story#
Full-text search is another place where DuckDB surprised us.
SQLite has FTS5, which is a decent full-text search extension. We used it initially for searching contacts. It worked. But it required a separate FTS index that had to be kept in sync with the main table. If a contact was updated, the FTS index had to be updated too. Managing this correctly was non-trivial.
DuckDB's full-text search is integrated as part of the main query engine, and its performance on our dataset was significantly better than FTS5. More importantly, it supports fuzzy matching out of the box, which is essential for contact search — people type names partially, misspell them, use abbreviations. Exact-match FTS is not good enough for real CRM search.
The DuckDB query:
SELECT * FROM v_people
WHERE fts_main_v_people.match_bm25(entry_id, 'sara chen stripe') IS NOT NULL
ORDER BY score DESC
LIMIT 10;This returns relevant results even for partial matches, handles typos reasonably, and does it in under 20ms on our test dataset. SQLite FTS5 required more boilerplate and performed worse on partial matches.
The Ecosystem and SQL Completeness#
One of the reasons DuckDB is so powerful is its SQL completeness. SQLite supports most SQL, but not all of it — window functions, CTEs, LATERAL joins all have limitations or gaps.
DuckDB supports the full SQL standard, including:
- Window functions (essential for analytics like rolling averages, rank within groups)
- Common table expressions (recursive CTEs for hierarchical data)
- Native PIVOT and UNPIVOT syntax
- Array and struct types
- JSON processing built-in
- Parquet and CSV reading as first-class citizens
For a CRM that needs to answer analytical questions, SQL completeness isn't a nice-to-have. It's what separates "we can answer this question" from "we'd need to write application code to answer this question."
Performance Benchmarks (Real Numbers)#
Let me be specific about performance differences we measured on our production dataset (approximately 10,000 contacts, 5,000 companies, 15,000 deals).
| Operation | SQLite | DuckDB | Speedup |
|---|---|---|---|
| PIVOT view (full read) | 820ms | 11ms | 75x |
| Full-text search | 180ms | 18ms | 10x |
| Pipeline aggregation | 75ms | 3ms | 25x |
| Single record lookup | 2ms | 1ms | 2x |
| Insert (single row) | 1ms | 3ms | 0.3x |
The last row is important: DuckDB is slightly slower for single-row inserts. This makes sense — its columnar format is optimized for reads, not writes. For a write-heavy transactional workload, SQLite would be faster. For a read-heavy analytical workload, DuckDB is dramatically faster.
CRM usage patterns are read-heavy. You insert a contact once; you read it hundreds of times. You run pipeline reports multiple times per day. The read/write ratio strongly favors DuckDB's optimization.
What We Gave Up#
Switching from SQLite to DuckDB wasn't free. Here's what we gave up:
Ubiquity. SQLite is everywhere. DuckDB is still relatively new (2018). The tooling ecosystem for DuckDB is less mature than for SQLite — fewer GUI clients, fewer backup tools, fewer integrations.
Write performance. As noted above, DuckDB is slower on single-row inserts. For a CRM, this is acceptable because inserts are rare relative to reads. But it's a real tradeoff.
Some familiarity. More engineers know SQLite than DuckDB. When someone new joins the team and needs to debug a database issue, SQLite has more resources and more community knowledge to draw on.
File compatibility. SQLite's .db file format is universally understood. DuckDB's .duckdb file format requires DuckDB to open. For users who want to inspect their data directly, DuckDB requires installing the DuckDB CLI, while SQLite files can be opened by dozens of tools.
Why We Made the Right Call#
Despite these costs, I'm confident the switch was right for DenchClaw.
The performance difference on analytical queries is categorical, not marginal. When a pipeline report goes from 800ms to 11ms, that's not a 98% improvement in speed — it's a change in the user's mental model of what's possible. At 800ms, you wait. At 11ms, you explore. Users run more queries, discover more insights, build more intuition about their data.
The PIVOT view performance directly enables the customizable schema architecture. If EAV pivots were slow, we'd have had to use a fixed schema and limit customization. DuckDB makes the flexible schema fast, which makes the product fundamentally better.
The SQL completeness means we can answer any question a sales team might have with standard SQL rather than application logic. When an analyst asks "what's my 30-day rolling average deal close rate by stage?", I can write that as a window function query in DuckDB, not as JavaScript code that manually computes rolling averages.
The Lesson: Match Your Database to Your Workload#
The deeper lesson from this experience isn't specific to DuckDB. It's about matching your database choice to your actual workload rather than defaulting to the familiar.
SQLite is the right answer for many workloads — transactional, write-heavy, simple query patterns. DuckDB is the right answer for analytical workloads with complex queries and read-heavy patterns. Both are embedded, both are file-based, both are appropriate for local-first applications. But they're optimized for different things.
The mistake we almost made was assuming that "embedded local database" was a single category with one right answer. It's not. The class of workloads that are poorly served by SQLite and excellently served by DuckDB is large, and CRM analytics is squarely in that class.
If you're building something similar — local-first analytics, flexible schema with PIVOT views, complex reporting — I'd recommend starting with DuckDB rather than SQLite. The initial learning curve is worth it.
Frequently Asked Questions#
Can I use DuckDB for transactional workloads too?#
Yes, but with caveats. DuckDB supports transactions, but its write performance is slower than SQLite for single-row operations. For mixed workloads, you might want to use DuckDB for analytics and a different store for write-heavy transactional operations, though for most CRM use cases DuckDB alone is sufficient.
Is DuckDB production-ready?#
Yes, DuckDB has been in stable release since 2023 and is used in production by many companies including major data platforms. It has excellent test coverage and a formal specification.
Can I open DuckDB files without DuckDB?#
DuckDB files use a proprietary format that requires DuckDB to read. The DuckDB CLI is a single binary download. DuckDB also supports exporting to open formats (Parquet, CSV, JSON) for interoperability.
How does DuckDB handle concurrent access?#
DuckDB supports single-writer, multiple-reader access. For DenchClaw's use case (one agent writing, UI reading), this is sufficient. It's not designed for high-concurrency multi-writer scenarios.
What's the memory footprint of DuckDB?#
DuckDB is designed to work within available RAM but can spill to disk for large queries. For a typical CRM dataset (tens of thousands of records), it runs comfortably within the default macOS memory constraints.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
