DuckDB Backup Strategies for Business Data

DuckDB backup is simpler than most databases but has specific requirements. Here are the strategies that actually work for protecting your business data.

Mark Rachapoom

March 26, 2026·6 min read

DuckDB Backup Strategies for Business Data

DuckDB backup is simpler than backing up PostgreSQL or MySQL, but you need to do it right. Copy the wrong file at the wrong time and you'll end up with a corrupted backup. This guide covers the strategies that work, when to use each, and how to automate them.

Understanding DuckDB's File Structure#

A DuckDB database consists of:

mydb.duckdb — the main database file (checkpointed data)
mydb.duckdb.wal — the Write-Ahead Log (uncommitted/uncheckpointed changes)

For a valid backup, you need either:

The .duckdb file after a checkpoint (WAL has been flushed into the main file)
Both files together (main file + WAL)

The most common mistake: copying only the .duckdb file while the WAL has unsynced changes. You get an older snapshot.

Strategy 1: File Copy After Checkpoint#

The simplest approach. Checkpoint the database (flush WAL to main file), then copy.

-- From a DuckDB connection
CHECKPOINT;

# Then copy the file
cp ~/.openclaw-dench/workspace/workspace.duckdb \
   /Backups/workspace-$(date +%Y%m%d-%H%M%S).duckdb

When to use: When you have a maintenance window or can tolerate brief write pauses.

Automated version (cron):

#!/bin/bash
# backup-duckdb.sh
DB_PATH="$HOME/.openclaw-dench/workspace/workspace.duckdb"
BACKUP_DIR="$HOME/Backups/duckdb"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
 
mkdir -p "$BACKUP_DIR"
 
# Checkpoint via CLI, then copy
duckdb "$DB_PATH" "CHECKPOINT;"
cp "$DB_PATH" "$BACKUP_DIR/workspace-$TIMESTAMP.duckdb"
 
# Keep only last 30 backups
ls -t "$BACKUP_DIR"/*.duckdb | tail -n +31 | xargs rm -f
 
echo "Backup complete: workspace-$TIMESTAMP.duckdb"

Add to crontab: 0 * * * * /path/to/backup-duckdb.sh (hourly backups).

Strategy 2: EXPORT DATABASE#

DuckDB has a built-in export command that writes your database to a directory of Parquet files — one per table. This is the most portable backup format.

EXPORT DATABASE '/path/to/backup/2026-03-26/' (FORMAT PARQUET);

The export directory contains:

One .parquet file per table
A schema.sql file with CREATE TABLE statements
A load_order.sql file for restoring in the right order

Restore:

IMPORT DATABASE '/path/to/backup/2026-03-26/';

Or manually:

-- Run schema.sql first
.read /path/to/backup/2026-03-26/schema.sql
-- Then load data
COPY table_name FROM '/path/to/backup/2026-03-26/table_name.parquet' (FORMAT PARQUET);

Advantages:

Human-readable (Parquet files can be queried directly)
Works across DuckDB versions (not tied to binary format)
Can restore individual tables
Compresses well

Disadvantages:

Slower than file copy for large databases
Requires DuckDB to be idle during export (no writes)

Strategy 3: Copy Both Files (WAL-Inclusive)#

If you can't checkpoint first (database is busy), copy both the .duckdb and .wal file together:

# Copy both atomically
cp mydb.duckdb /backup/mydb.duckdb
cp mydb.duckdb.wal /backup/mydb.duckdb.wal 2>/dev/null || true

When DuckDB opens the backup, it automatically replays the WAL. This gives you the most recent state.

Note: This only works reliably if no writes happen between the two copy operations. For production, use a brief file lock or copy during low-traffic windows.

Strategy 4: Continuous Backup to S3#

For production systems, stream backups to S3 using DuckDB's httpfs extension:

INSTALL httpfs;
LOAD httpfs;
 
SET s3_region = 'us-east-1';
SET s3_access_key_id = 'YOUR_KEY';
SET s3_secret_access_key = 'YOUR_SECRET';
 
-- Export directly to S3
EXPORT DATABASE 's3://my-backups/duckdb/2026-03-26-1200/' (FORMAT PARQUET);

Automated S3 backup script:

#!/bin/bash
DB_PATH="$HOME/.openclaw-dench/workspace/workspace.duckdb"
S3_BUCKET="s3://my-company-backups"
DATE=$(date +%Y/%m/%d)
HOUR=$(date +%H)
 
duckdb "$DB_PATH" "
  INSTALL httpfs;
  LOAD httpfs;
  SET s3_region='us-east-1';
  SET s3_access_key_id='${AWS_ACCESS_KEY_ID}';
  SET s3_secret_access_key='${AWS_SECRET_ACCESS_KEY}';
  CHECKPOINT;
  EXPORT DATABASE '${S3_BUCKET}/duckdb/${DATE}/${HOUR}/' (FORMAT PARQUET);
"
 
echo "S3 backup complete"

Strategy 5: Git-Based Backup#

For small databases (under a few hundred MB), commit the DuckDB file to a private Git repository:

#!/bin/bash
cd /path/to/your/data-repo
 
# Checkpoint first
duckdb workspace.duckdb "CHECKPOINT;"
 
# Commit
git add workspace.duckdb
git commit -m "backup: $(date +%Y-%m-%d-%H%M)"
git push origin main

This gives you version history, easy rollback, and free hosting on GitHub/GitLab for small databases. Not suitable for databases over a few hundred MB due to Git's size limits.

Backup Retention Policy#

A practical retention schedule for business data:

Backup Type	Frequency	Retention
Hourly	Every hour	24 hours
Daily	Midnight	30 days
Weekly	Sunday midnight	12 weeks
Monthly	1st of month	12 months
Yearly	January 1	Forever

Script to implement this:

#!/bin/bash
DB="$HOME/.openclaw-dench/workspace/workspace.duckdb"
BACKUP_BASE="$HOME/Backups/duckdb"
 
duckdb "$DB" "CHECKPOINT;"
 
# Hourly
mkdir -p "$BACKUP_BASE/hourly"
cp "$DB" "$BACKUP_BASE/hourly/$(date +%H).duckdb"
 
# Daily (run at midnight)
if [ "$(date +%H)" = "00" ]; then
    mkdir -p "$BACKUP_BASE/daily"
    cp "$DB" "$BACKUP_BASE/daily/$(date +%Y-%m-%d).duckdb"
    # Keep 30 days
    find "$BACKUP_BASE/daily" -mtime +30 -delete
fi

Verifying Backups#

Always test your backups. A backup that can't be restored is not a backup.

#!/bin/bash
BACKUP="$1"
 
# Try to open and query the backup
RESULT=$(duckdb "$BACKUP" "SELECT COUNT(*) FROM duckdb_tables();" 2>&1)
 
if echo "$RESULT" | grep -q "^[0-9]"; then
    echo "✅ Backup valid: $BACKUP ($RESULT tables)"
else
    echo "❌ Backup invalid: $BACKUP"
    echo "$RESULT"
    exit 1
fi

For EXPORT DATABASE backups:

# Create a temp DuckDB to test restoration
duckdb /tmp/restore_test.duckdb "IMPORT DATABASE '/path/to/backup/';"
duckdb /tmp/restore_test.duckdb "SELECT table_name, estimated_size FROM duckdb_tables();"
rm /tmp/restore_test.duckdb

DenchClaw Backup Integration#

DenchClaw stores all your CRM data in a DuckDB file at ~/.openclaw-dench/workspace/workspace.duckdb. Backing this up protects your:

All CRM objects, fields, and entries
Contact notes and documents
Pipeline data and deal history
Custom views and configurations

The simplest backup for a DenchClaw workspace:

# Add to crontab: 0 * * * *
duckdb "$HOME/.openclaw-dench/workspace/workspace.duckdb" "CHECKPOINT;"
cp "$HOME/.openclaw-dench/workspace/workspace.duckdb" \
   "$HOME/Backups/denchclaw-$(date +%Y%m%d-%H%M).duckdb"

Combined with Time Machine (macOS) or a similar backup tool, this gives you multiple layers of protection.

Also back up the workspace directory itself (markdown files, YAML configs, memory files):

tar -czf "$HOME/Backups/denchclaw-workspace-$(date +%Y%m%d).tar.gz" \
    "$HOME/.openclaw-dench/workspace/" \
    --exclude="*.duckdb" \
    --exclude=".git"