Back to The Times of Claw

Running DuckDB in Production: What You Need to Know

Running DuckDB in production is different from Postgres or MySQL. Here's what you need to know about concurrency, backups, and architecture before you ship.

Mark Rachapoom
Mark Rachapoom
·7 min read
Running DuckDB in Production: What You Need to Know

Running DuckDB in Production: What You Need to Know

DuckDB is excellent for analytics. But running it in production requires understanding its specific constraints and design philosophy. It's not a drop-in replacement for PostgreSQL — it's a different tool, optimized for different workloads.

Here's the honest picture on what works, what doesn't, and how to architect production systems that leverage DuckDB's strengths.

The Key Constraint: Single Writer#

DuckDB supports multiple concurrent readers, but only one writer at a time. This is the most important thing to understand before deploying DuckDB in production.

If two processes try to open a DuckDB file in read-write mode simultaneously, the second one gets an error:

Database "mydb.duckdb" is already opened by another process.

This is by design. DuckDB uses file-level locking (.wal file + lock file) to enforce single-writer semantics.

What this means in practice:

  • ✅ One application process + multiple read-only connections: works
  • ✅ Single writer + separate read replica: works with periodic file copy
  • ✅ Multiple readers with READ_ONLY flag: works
  • ❌ Multiple application servers writing to the same DuckDB file: doesn't work
  • ❌ Direct multi-writer concurrent access: doesn't work

Architecture Patterns That Work#

Pattern 1: Embedded Analytics in a Single Service#

The simplest and most common pattern. One service owns the DuckDB file and all read/write access goes through it.

[App Server] → [DuckDB file] (read/write)
     ↑
[API clients]

This works great for:

  • Internal dashboards and analytics tools
  • Single-tenant SaaS where each customer gets their own DuckDB file
  • DenchClaw-style personal CRM where one user owns one database
  • Data pipelines run by a single process

Pattern 2: Read Replicas via File Copy#

For read-heavy workloads, periodically copy the DuckDB file to read replicas:

# Copy the DuckDB file (safe when no writes are happening)
cp production.duckdb replica.duckdb
 
# Or use DuckDB's backup command
duckdb production.duckdb "EXPORT DATABASE '/backup/snapshot';"
[Writer Service] → [production.duckdb]
                           ↓ (copy every N minutes)
[Reader Service 1] → [replica.duckdb] (READ_ONLY)
[Reader Service 2] → [replica.duckdb] (READ_ONLY)

Pattern 3: Write-Through Cache#

Application writes go to a primary OLTP database (Postgres, MySQL), and a background process syncs to DuckDB for analytics:

[App] → [PostgreSQL] (writes)
              ↓ (sync every hour)
         [DuckDB] (analytics queries only)
[Dashboard] → [DuckDB] (fast reads)

This is the pattern used in most production analytics stacks. PostgreSQL handles the transactional writes; DuckDB handles the analytics.

Pattern 4: One DuckDB Per Tenant#

For multi-tenant SaaS, give each tenant their own DuckDB file:

/data/tenants/
  acme-corp.duckdb
  globex.duckdb
  initech.duckdb

Each tenant gets isolated storage, easy backup, easy deletion, and no contention with other tenants. This scales horizontally — add tenants, add files.

Concurrency Within a Single Process#

Within a single process, DuckDB handles concurrency well:

import duckdb
import threading
 
# One connection can be used from multiple threads
con = duckdb.connect('production.duckdb')
 
# Or use connection pools for parallel queries
con1 = duckdb.connect('production.duckdb', read_only=True)
con2 = duckdb.connect('production.duckdb', read_only=True)
 
# Both can query in parallel

DuckDB automatically uses multiple CPU cores for query execution. A single query can parallelize across all available cores.

Read-Only Connections#

Open DuckDB in read-only mode to allow concurrent access for analytics:

# Python
con = duckdb.connect('production.duckdb', read_only=True)
 
# CLI
duckdb -readonly production.duckdb
 
# SQL
ATTACH 'production.duckdb' AS db (READ_ONLY);

Read-only connections can coexist with a single writer or with each other. They don't block writes, and writes don't block them (they see a consistent snapshot).

Memory Configuration#

DuckDB will use available RAM aggressively. Configure limits for production:

-- Limit memory to 4GB
SET memory_limit = '4GB';
 
-- Set temp directory for spilling to disk
SET temp_directory = '/tmp/duckdb_temp';
 
-- Control parallelism
SET threads = 4;

In a containerized environment, always set memory_limit to less than your container's memory limit to avoid OOM kills.

Persistence and Durability#

DuckDB uses a Write-Ahead Log (WAL) for durability:

  • Changes are written to the WAL first
  • WAL is checkpointed into the main database file periodically
  • On crash, DuckDB replays the WAL on next open

This means DuckDB is ACID-compliant for single-writer scenarios. Transactions either commit or roll back — no partial writes.

BEGIN;
INSERT INTO deals VALUES ('deal-123', 'Acme Corp', 50000, 'Proposal');
UPDATE pipeline_stats SET total_value = total_value + 50000;
COMMIT;
-- Both inserts succeed or both fail

Backup Strategy#

See the full DuckDB backup guide, but the quick summary:

-- Checkpoint first (flush WAL)
CHECKPOINT;
 
-- Then copy the .duckdb file
-- cp production.duckdb backup-2026-03-26.duckdb
 
-- Or export to directory
EXPORT DATABASE '/backups/2026-03-26/' (FORMAT PARQUET);

Monitoring DuckDB in Production#

DuckDB doesn't have built-in monitoring. Here's how to add it:

-- Check database size
SELECT database_size FROM pragma_database_size();
 
-- Check table sizes
SELECT table_name, estimated_size
FROM duckdb_tables()
ORDER BY estimated_size DESC;
 
-- Active queries (DuckDB 0.10+)
SELECT query_id, query, elapsed
FROM duckdb_queries()
WHERE elapsed > interval '1 second';
 
-- Memory usage
SELECT current_usage, peak_usage
FROM pragma_memory_usage();

Wrap these in a monitoring script that runs every minute and ships metrics to your observability stack.

When Not to Use DuckDB in Production#

Be clear about DuckDB's limits:

Don't use DuckDB when you need:

  • High-frequency concurrent writes (>100 writes/sec from multiple processes)
  • Real-time row-level security across many users
  • Fine-grained GRANT/REVOKE access control
  • Replication to standby servers
  • Point-in-time recovery at second granularity
  • Distributed query execution across multiple nodes

For these use cases, PostgreSQL, MySQL, or a managed OLAP store is more appropriate.

DuckDB is ideal when you need:

  • Fast analytical queries on a dataset that fits on one machine
  • Embedded analytics in a single-process application
  • Local-first data storage with zero server overhead
  • Per-tenant isolated databases

DuckDB in Production at DenchClaw#

DenchClaw runs DuckDB in production — embedded in each user's local workspace. The design is exactly Pattern 1 above: one DenchClaw process, one DuckDB file, all access through the agent.

The EAV schema is designed for single-writer access with optimized read paths through PIVOT views. All write operations go through the DenchClaw agent; all read operations hit the v_* views.

This is a clean production pattern: single writer, structured schema, no external database server, runs anywhere Node.js runs.

Frequently Asked Questions#

Can I use DuckDB as a production database for a web app?#

Yes, with the single-writer constraint in mind. A single web server process can read and write DuckDB. Multiple web server processes cannot share the same DuckDB file for writes.

What happens if my app crashes mid-write?#

DuckDB replays the WAL on next open and recovers to the last committed transaction. Uncommitted transactions are rolled back.

How do I scale DuckDB beyond one machine?#

You don't — DuckDB is designed for single-machine use. For distributed analytics, look at ClickHouse, Apache Druid, or BigQuery.

Is DuckDB suitable for OLTP workloads?#

No. DuckDB is optimized for OLAP (analytical) queries. For OLTP (transactional) workloads with many small reads/writes, use PostgreSQL or SQLite.

Can I run DuckDB in a serverless environment?#

Yes. AWS Lambda, Google Cloud Functions, and similar environments work fine — each invocation opens DuckDB, reads/writes, and closes. Use S3 + httpfs for persistent storage across invocations.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA