DuckDB Backup Strategies for Business Data
DuckDB backup is simpler than most databases but has specific requirements. Here are the strategies that actually work for protecting your business data.
DuckDB Backup Strategies for Business Data
DuckDB backup is simpler than backing up PostgreSQL or MySQL, but you need to do it right. Copy the wrong file at the wrong time and you'll end up with a corrupted backup. This guide covers the strategies that work, when to use each, and how to automate them.
Understanding DuckDB's File Structure#
A DuckDB database consists of:
mydb.duckdb— the main database file (checkpointed data)mydb.duckdb.wal— the Write-Ahead Log (uncommitted/uncheckpointed changes)
For a valid backup, you need either:
- The
.duckdbfile after a checkpoint (WAL has been flushed into the main file) - Both files together (main file + WAL)
The most common mistake: copying only the .duckdb file while the WAL has unsynced changes. You get an older snapshot.
Strategy 1: File Copy After Checkpoint#
The simplest approach. Checkpoint the database (flush WAL to main file), then copy.
-- From a DuckDB connection
CHECKPOINT;# Then copy the file
cp ~/.openclaw-dench/workspace/workspace.duckdb \
/Backups/workspace-$(date +%Y%m%d-%H%M%S).duckdbWhen to use: When you have a maintenance window or can tolerate brief write pauses.
Automated version (cron):
#!/bin/bash
# backup-duckdb.sh
DB_PATH="$HOME/.openclaw-dench/workspace/workspace.duckdb"
BACKUP_DIR="$HOME/Backups/duckdb"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
mkdir -p "$BACKUP_DIR"
# Checkpoint via CLI, then copy
duckdb "$DB_PATH" "CHECKPOINT;"
cp "$DB_PATH" "$BACKUP_DIR/workspace-$TIMESTAMP.duckdb"
# Keep only last 30 backups
ls -t "$BACKUP_DIR"/*.duckdb | tail -n +31 | xargs rm -f
echo "Backup complete: workspace-$TIMESTAMP.duckdb"Add to crontab: 0 * * * * /path/to/backup-duckdb.sh (hourly backups).
Strategy 2: EXPORT DATABASE#
DuckDB has a built-in export command that writes your database to a directory of Parquet files — one per table. This is the most portable backup format.
EXPORT DATABASE '/path/to/backup/2026-03-26/' (FORMAT PARQUET);The export directory contains:
- One
.parquetfile per table - A
schema.sqlfile with CREATE TABLE statements - A
load_order.sqlfile for restoring in the right order
Restore:
IMPORT DATABASE '/path/to/backup/2026-03-26/';Or manually:
-- Run schema.sql first
.read /path/to/backup/2026-03-26/schema.sql
-- Then load data
COPY table_name FROM '/path/to/backup/2026-03-26/table_name.parquet' (FORMAT PARQUET);Advantages:
- Human-readable (Parquet files can be queried directly)
- Works across DuckDB versions (not tied to binary format)
- Can restore individual tables
- Compresses well
Disadvantages:
- Slower than file copy for large databases
- Requires DuckDB to be idle during export (no writes)
Strategy 3: Copy Both Files (WAL-Inclusive)#
If you can't checkpoint first (database is busy), copy both the .duckdb and .wal file together:
# Copy both atomically
cp mydb.duckdb /backup/mydb.duckdb
cp mydb.duckdb.wal /backup/mydb.duckdb.wal 2>/dev/null || trueWhen DuckDB opens the backup, it automatically replays the WAL. This gives you the most recent state.
Note: This only works reliably if no writes happen between the two copy operations. For production, use a brief file lock or copy during low-traffic windows.
Strategy 4: Continuous Backup to S3#
For production systems, stream backups to S3 using DuckDB's httpfs extension:
INSTALL httpfs;
LOAD httpfs;
SET s3_region = 'us-east-1';
SET s3_access_key_id = 'YOUR_KEY';
SET s3_secret_access_key = 'YOUR_SECRET';
-- Export directly to S3
EXPORT DATABASE 's3://my-backups/duckdb/2026-03-26-1200/' (FORMAT PARQUET);Automated S3 backup script:
#!/bin/bash
DB_PATH="$HOME/.openclaw-dench/workspace/workspace.duckdb"
S3_BUCKET="s3://my-company-backups"
DATE=$(date +%Y/%m/%d)
HOUR=$(date +%H)
duckdb "$DB_PATH" "
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-east-1';
SET s3_access_key_id='${AWS_ACCESS_KEY_ID}';
SET s3_secret_access_key='${AWS_SECRET_ACCESS_KEY}';
CHECKPOINT;
EXPORT DATABASE '${S3_BUCKET}/duckdb/${DATE}/${HOUR}/' (FORMAT PARQUET);
"
echo "S3 backup complete"Strategy 5: Git-Based Backup#
For small databases (under a few hundred MB), commit the DuckDB file to a private Git repository:
#!/bin/bash
cd /path/to/your/data-repo
# Checkpoint first
duckdb workspace.duckdb "CHECKPOINT;"
# Commit
git add workspace.duckdb
git commit -m "backup: $(date +%Y-%m-%d-%H%M)"
git push origin mainThis gives you version history, easy rollback, and free hosting on GitHub/GitLab for small databases. Not suitable for databases over a few hundred MB due to Git's size limits.
Backup Retention Policy#
A practical retention schedule for business data:
| Backup Type | Frequency | Retention |
|---|---|---|
| Hourly | Every hour | 24 hours |
| Daily | Midnight | 30 days |
| Weekly | Sunday midnight | 12 weeks |
| Monthly | 1st of month | 12 months |
| Yearly | January 1 | Forever |
Script to implement this:
#!/bin/bash
DB="$HOME/.openclaw-dench/workspace/workspace.duckdb"
BACKUP_BASE="$HOME/Backups/duckdb"
duckdb "$DB" "CHECKPOINT;"
# Hourly
mkdir -p "$BACKUP_BASE/hourly"
cp "$DB" "$BACKUP_BASE/hourly/$(date +%H).duckdb"
# Daily (run at midnight)
if [ "$(date +%H)" = "00" ]; then
mkdir -p "$BACKUP_BASE/daily"
cp "$DB" "$BACKUP_BASE/daily/$(date +%Y-%m-%d).duckdb"
# Keep 30 days
find "$BACKUP_BASE/daily" -mtime +30 -delete
fiVerifying Backups#
Always test your backups. A backup that can't be restored is not a backup.
#!/bin/bash
BACKUP="$1"
# Try to open and query the backup
RESULT=$(duckdb "$BACKUP" "SELECT COUNT(*) FROM duckdb_tables();" 2>&1)
if echo "$RESULT" | grep -q "^[0-9]"; then
echo "✅ Backup valid: $BACKUP ($RESULT tables)"
else
echo "❌ Backup invalid: $BACKUP"
echo "$RESULT"
exit 1
fiFor EXPORT DATABASE backups:
# Create a temp DuckDB to test restoration
duckdb /tmp/restore_test.duckdb "IMPORT DATABASE '/path/to/backup/';"
duckdb /tmp/restore_test.duckdb "SELECT table_name, estimated_size FROM duckdb_tables();"
rm /tmp/restore_test.duckdbDenchClaw Backup Integration#
DenchClaw stores all your CRM data in a DuckDB file at ~/.openclaw-dench/workspace/workspace.duckdb. Backing this up protects your:
- All CRM objects, fields, and entries
- Contact notes and documents
- Pipeline data and deal history
- Custom views and configurations
The simplest backup for a DenchClaw workspace:
# Add to crontab: 0 * * * *
duckdb "$HOME/.openclaw-dench/workspace/workspace.duckdb" "CHECKPOINT;"
cp "$HOME/.openclaw-dench/workspace/workspace.duckdb" \
"$HOME/Backups/denchclaw-$(date +%Y%m%d-%H%M).duckdb"Combined with Time Machine (macOS) or a similar backup tool, this gives you multiple layers of protection.
Also back up the workspace directory itself (markdown files, YAML configs, memory files):
tar -czf "$HOME/Backups/denchclaw-workspace-$(date +%Y%m%d).tar.gz" \
"$HOME/.openclaw-dench/workspace/" \
--exclude="*.duckdb" \
--exclude=".git"Frequently Asked Questions#
Can I back up DuckDB while it's in use?#
Yes, if you use read-only connections. Open a read-only connection, checkpoint via a separate write connection, then copy the file. For EXPORT DATABASE, the database must not have active write transactions.
How large will my DuckDB backups be?#
DuckDB uses good compression internally. EXPORT DATABASE to Parquet typically compresses 3-5x. A 1GB DuckDB file may produce 200-400MB of Parquet backups.
How do I restore a backup to a new machine?#
Copy the .duckdb file to the same path on the new machine and open it. For EXPORT DATABASE backups, run IMPORT DATABASE '/path/to/backup/' on any DuckDB instance.
Does DuckDB support incremental backups?#
Not natively. The WAL acts as an incremental log within a session, but there's no built-in incremental backup between sessions. Use Parquet exports and sync only changed files as an approximation.
What's the best backup format for long-term archival?#
EXPORT DATABASE with Parquet format. Parquet is an open standard, can be read without DuckDB, and compresses well. The .duckdb binary format is tied to the DuckDB version.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
