Running DuckDB in Production: What You Need to Know

Running DuckDB in production is different from Postgres or MySQL. Here's what you need to know about concurrency, backups, and architecture before you ship.

Mark Rachapoom

March 26, 2026·7 min read

Running DuckDB in Production: What You Need to Know

DuckDB is excellent for analytics. But running it in production requires understanding its specific constraints and design philosophy. It's not a drop-in replacement for PostgreSQL — it's a different tool, optimized for different workloads.

Here's the honest picture on what works, what doesn't, and how to architect production systems that leverage DuckDB's strengths.

The Key Constraint: Single Writer#

DuckDB supports multiple concurrent readers, but only one writer at a time. This is the most important thing to understand before deploying DuckDB in production.

If two processes try to open a DuckDB file in read-write mode simultaneously, the second one gets an error:

Database "mydb.duckdb" is already opened by another process.

This is by design. DuckDB uses file-level locking (.wal file + lock file) to enforce single-writer semantics.

What this means in practice:

✅ One application process + multiple read-only connections: works
✅ Single writer + separate read replica: works with periodic file copy
✅ Multiple readers with READ_ONLY flag: works
❌ Multiple application servers writing to the same DuckDB file: doesn't work
❌ Direct multi-writer concurrent access: doesn't work

Architecture Patterns That Work#

Pattern 1: Embedded Analytics in a Single Service#

The simplest and most common pattern. One service owns the DuckDB file and all read/write access goes through it.

[App Server] → [DuckDB file] (read/write)
     ↑
[API clients]

This works great for:

Internal dashboards and analytics tools
Single-tenant SaaS where each customer gets their own DuckDB file
DenchClaw-style personal CRM where one user owns one database
Data pipelines run by a single process

Pattern 2: Read Replicas via File Copy#

For read-heavy workloads, periodically copy the DuckDB file to read replicas:

# Copy the DuckDB file (safe when no writes are happening)
cp production.duckdb replica.duckdb
 
# Or use DuckDB's backup command
duckdb production.duckdb "EXPORT DATABASE '/backup/snapshot';"

[Writer Service] → [production.duckdb]
                           ↓ (copy every N minutes)
[Reader Service 1] → [replica.duckdb] (READ_ONLY)
[Reader Service 2] → [replica.duckdb] (READ_ONLY)

Pattern 3: Write-Through Cache#

Application writes go to a primary OLTP database (Postgres, MySQL), and a background process syncs to DuckDB for analytics:

[App] → [PostgreSQL] (writes)
              ↓ (sync every hour)
         [DuckDB] (analytics queries only)
[Dashboard] → [DuckDB] (fast reads)

This is the pattern used in most production analytics stacks. PostgreSQL handles the transactional writes; DuckDB handles the analytics.

Pattern 4: One DuckDB Per Tenant#

For multi-tenant SaaS, give each tenant their own DuckDB file:

/data/tenants/
  acme-corp.duckdb
  globex.duckdb
  initech.duckdb

Each tenant gets isolated storage, easy backup, easy deletion, and no contention with other tenants. This scales horizontally — add tenants, add files.

Concurrency Within a Single Process#

Within a single process, DuckDB handles concurrency well:

import duckdb
import threading
 
# One connection can be used from multiple threads
con = duckdb.connect('production.duckdb')
 
# Or use connection pools for parallel queries
con1 = duckdb.connect('production.duckdb', read_only=True)
con2 = duckdb.connect('production.duckdb', read_only=True)
 
# Both can query in parallel

DuckDB automatically uses multiple CPU cores for query execution. A single query can parallelize across all available cores.

Read-Only Connections#

Open DuckDB in read-only mode to allow concurrent access for analytics:

# Python
con = duckdb.connect('production.duckdb', read_only=True)
 
# CLI
duckdb -readonly production.duckdb
 
# SQL
ATTACH 'production.duckdb' AS db (READ_ONLY);

Read-only connections can coexist with a single writer or with each other. They don't block writes, and writes don't block them (they see a consistent snapshot).

Memory Configuration#

DuckDB will use available RAM aggressively. Configure limits for production:

-- Limit memory to 4GB
SET memory_limit = '4GB';
 
-- Set temp directory for spilling to disk
SET temp_directory = '/tmp/duckdb_temp';
 
-- Control parallelism
SET threads = 4;

In a containerized environment, always set memory_limit to less than your container's memory limit to avoid OOM kills.

Persistence and Durability#

DuckDB uses a Write-Ahead Log (WAL) for durability:

Changes are written to the WAL first
WAL is checkpointed into the main database file periodically
On crash, DuckDB replays the WAL on next open

This means DuckDB is ACID-compliant for single-writer scenarios. Transactions either commit or roll back — no partial writes.

BEGIN;
INSERT INTO deals VALUES ('deal-123', 'Acme Corp', 50000, 'Proposal');
UPDATE pipeline_stats SET total_value = total_value + 50000;
COMMIT;
-- Both inserts succeed or both fail

Backup Strategy#

See the full DuckDB backup guide, but the quick summary:

-- Checkpoint first (flush WAL)
CHECKPOINT;
 
-- Then copy the .duckdb file
-- cp production.duckdb backup-2026-03-26.duckdb
 
-- Or export to directory
EXPORT DATABASE '/backups/2026-03-26/' (FORMAT PARQUET);

Monitoring DuckDB in Production#

DuckDB doesn't have built-in monitoring. Here's how to add it:

-- Check database size
SELECT database_size FROM pragma_database_size();
 
-- Check table sizes
SELECT table_name, estimated_size
FROM duckdb_tables()
ORDER BY estimated_size DESC;
 
-- Active queries (DuckDB 0.10+)
SELECT query_id, query, elapsed
FROM duckdb_queries()
WHERE elapsed > interval '1 second';
 
-- Memory usage
SELECT current_usage, peak_usage
FROM pragma_memory_usage();

Wrap these in a monitoring script that runs every minute and ships metrics to your observability stack.

When Not to Use DuckDB in Production#

Be clear about DuckDB's limits:

Don't use DuckDB when you need:

High-frequency concurrent writes (>100 writes/sec from multiple processes)
Real-time row-level security across many users
Fine-grained GRANT/REVOKE access control
Replication to standby servers
Point-in-time recovery at second granularity
Distributed query execution across multiple nodes

For these use cases, PostgreSQL, MySQL, or a managed OLAP store is more appropriate.

DuckDB is ideal when you need:

Fast analytical queries on a dataset that fits on one machine
Embedded analytics in a single-process application
Local-first data storage with zero server overhead
Per-tenant isolated databases

DuckDB in Production at DenchClaw#

DenchClaw runs DuckDB in production — embedded in each user's local workspace. The design is exactly Pattern 1 above: one DenchClaw process, one DuckDB file, all access through the agent.

The EAV schema is designed for single-writer access with optimized read paths through PIVOT views. All write operations go through the DenchClaw agent; all read operations hit the v_* views.

This is a clean production pattern: single writer, structured schema, no external database server, runs anywhere Node.js runs.