Back to The Times of Claw

DuckDB Backup Strategies for Business Data

DuckDB backup is simpler than most databases but has specific requirements. Here are the strategies that actually work for protecting your business data.

Mark Rachapoom
Mark Rachapoom
·6 min read
DuckDB Backup Strategies for Business Data

DuckDB Backup Strategies for Business Data

DuckDB backup is simpler than backing up PostgreSQL or MySQL, but you need to do it right. Copy the wrong file at the wrong time and you'll end up with a corrupted backup. This guide covers the strategies that work, when to use each, and how to automate them.

Understanding DuckDB's File Structure#

A DuckDB database consists of:

  • mydb.duckdb — the main database file (checkpointed data)
  • mydb.duckdb.wal — the Write-Ahead Log (uncommitted/uncheckpointed changes)

For a valid backup, you need either:

  1. The .duckdb file after a checkpoint (WAL has been flushed into the main file)
  2. Both files together (main file + WAL)

The most common mistake: copying only the .duckdb file while the WAL has unsynced changes. You get an older snapshot.

Strategy 1: File Copy After Checkpoint#

The simplest approach. Checkpoint the database (flush WAL to main file), then copy.

-- From a DuckDB connection
CHECKPOINT;
# Then copy the file
cp ~/.openclaw-dench/workspace/workspace.duckdb \
   /Backups/workspace-$(date +%Y%m%d-%H%M%S).duckdb

When to use: When you have a maintenance window or can tolerate brief write pauses.

Automated version (cron):

#!/bin/bash
# backup-duckdb.sh
DB_PATH="$HOME/.openclaw-dench/workspace/workspace.duckdb"
BACKUP_DIR="$HOME/Backups/duckdb"
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
 
mkdir -p "$BACKUP_DIR"
 
# Checkpoint via CLI, then copy
duckdb "$DB_PATH" "CHECKPOINT;"
cp "$DB_PATH" "$BACKUP_DIR/workspace-$TIMESTAMP.duckdb"
 
# Keep only last 30 backups
ls -t "$BACKUP_DIR"/*.duckdb | tail -n +31 | xargs rm -f
 
echo "Backup complete: workspace-$TIMESTAMP.duckdb"

Add to crontab: 0 * * * * /path/to/backup-duckdb.sh (hourly backups).

Strategy 2: EXPORT DATABASE#

DuckDB has a built-in export command that writes your database to a directory of Parquet files — one per table. This is the most portable backup format.

EXPORT DATABASE '/path/to/backup/2026-03-26/' (FORMAT PARQUET);

The export directory contains:

  • One .parquet file per table
  • A schema.sql file with CREATE TABLE statements
  • A load_order.sql file for restoring in the right order

Restore:

IMPORT DATABASE '/path/to/backup/2026-03-26/';

Or manually:

-- Run schema.sql first
.read /path/to/backup/2026-03-26/schema.sql
-- Then load data
COPY table_name FROM '/path/to/backup/2026-03-26/table_name.parquet' (FORMAT PARQUET);

Advantages:

  • Human-readable (Parquet files can be queried directly)
  • Works across DuckDB versions (not tied to binary format)
  • Can restore individual tables
  • Compresses well

Disadvantages:

  • Slower than file copy for large databases
  • Requires DuckDB to be idle during export (no writes)

Strategy 3: Copy Both Files (WAL-Inclusive)#

If you can't checkpoint first (database is busy), copy both the .duckdb and .wal file together:

# Copy both atomically
cp mydb.duckdb /backup/mydb.duckdb
cp mydb.duckdb.wal /backup/mydb.duckdb.wal 2>/dev/null || true

When DuckDB opens the backup, it automatically replays the WAL. This gives you the most recent state.

Note: This only works reliably if no writes happen between the two copy operations. For production, use a brief file lock or copy during low-traffic windows.

Strategy 4: Continuous Backup to S3#

For production systems, stream backups to S3 using DuckDB's httpfs extension:

INSTALL httpfs;
LOAD httpfs;
 
SET s3_region = 'us-east-1';
SET s3_access_key_id = 'YOUR_KEY';
SET s3_secret_access_key = 'YOUR_SECRET';
 
-- Export directly to S3
EXPORT DATABASE 's3://my-backups/duckdb/2026-03-26-1200/' (FORMAT PARQUET);

Automated S3 backup script:

#!/bin/bash
DB_PATH="$HOME/.openclaw-dench/workspace/workspace.duckdb"
S3_BUCKET="s3://my-company-backups"
DATE=$(date +%Y/%m/%d)
HOUR=$(date +%H)
 
duckdb "$DB_PATH" "
  INSTALL httpfs;
  LOAD httpfs;
  SET s3_region='us-east-1';
  SET s3_access_key_id='${AWS_ACCESS_KEY_ID}';
  SET s3_secret_access_key='${AWS_SECRET_ACCESS_KEY}';
  CHECKPOINT;
  EXPORT DATABASE '${S3_BUCKET}/duckdb/${DATE}/${HOUR}/' (FORMAT PARQUET);
"
 
echo "S3 backup complete"

Strategy 5: Git-Based Backup#

For small databases (under a few hundred MB), commit the DuckDB file to a private Git repository:

#!/bin/bash
cd /path/to/your/data-repo
 
# Checkpoint first
duckdb workspace.duckdb "CHECKPOINT;"
 
# Commit
git add workspace.duckdb
git commit -m "backup: $(date +%Y-%m-%d-%H%M)"
git push origin main

This gives you version history, easy rollback, and free hosting on GitHub/GitLab for small databases. Not suitable for databases over a few hundred MB due to Git's size limits.

Backup Retention Policy#

A practical retention schedule for business data:

Backup TypeFrequencyRetention
HourlyEvery hour24 hours
DailyMidnight30 days
WeeklySunday midnight12 weeks
Monthly1st of month12 months
YearlyJanuary 1Forever

Script to implement this:

#!/bin/bash
DB="$HOME/.openclaw-dench/workspace/workspace.duckdb"
BACKUP_BASE="$HOME/Backups/duckdb"
 
duckdb "$DB" "CHECKPOINT;"
 
# Hourly
mkdir -p "$BACKUP_BASE/hourly"
cp "$DB" "$BACKUP_BASE/hourly/$(date +%H).duckdb"
 
# Daily (run at midnight)
if [ "$(date +%H)" = "00" ]; then
    mkdir -p "$BACKUP_BASE/daily"
    cp "$DB" "$BACKUP_BASE/daily/$(date +%Y-%m-%d).duckdb"
    # Keep 30 days
    find "$BACKUP_BASE/daily" -mtime +30 -delete
fi

Verifying Backups#

Always test your backups. A backup that can't be restored is not a backup.

#!/bin/bash
BACKUP="$1"
 
# Try to open and query the backup
RESULT=$(duckdb "$BACKUP" "SELECT COUNT(*) FROM duckdb_tables();" 2>&1)
 
if echo "$RESULT" | grep -q "^[0-9]"; then
    echo "✅ Backup valid: $BACKUP ($RESULT tables)"
else
    echo "❌ Backup invalid: $BACKUP"
    echo "$RESULT"
    exit 1
fi

For EXPORT DATABASE backups:

# Create a temp DuckDB to test restoration
duckdb /tmp/restore_test.duckdb "IMPORT DATABASE '/path/to/backup/';"
duckdb /tmp/restore_test.duckdb "SELECT table_name, estimated_size FROM duckdb_tables();"
rm /tmp/restore_test.duckdb

DenchClaw Backup Integration#

DenchClaw stores all your CRM data in a DuckDB file at ~/.openclaw-dench/workspace/workspace.duckdb. Backing this up protects your:

  • All CRM objects, fields, and entries
  • Contact notes and documents
  • Pipeline data and deal history
  • Custom views and configurations

The simplest backup for a DenchClaw workspace:

# Add to crontab: 0 * * * *
duckdb "$HOME/.openclaw-dench/workspace/workspace.duckdb" "CHECKPOINT;"
cp "$HOME/.openclaw-dench/workspace/workspace.duckdb" \
   "$HOME/Backups/denchclaw-$(date +%Y%m%d-%H%M).duckdb"

Combined with Time Machine (macOS) or a similar backup tool, this gives you multiple layers of protection.

Also back up the workspace directory itself (markdown files, YAML configs, memory files):

tar -czf "$HOME/Backups/denchclaw-workspace-$(date +%Y%m%d).tar.gz" \
    "$HOME/.openclaw-dench/workspace/" \
    --exclude="*.duckdb" \
    --exclude=".git"

Frequently Asked Questions#

Can I back up DuckDB while it's in use?#

Yes, if you use read-only connections. Open a read-only connection, checkpoint via a separate write connection, then copy the file. For EXPORT DATABASE, the database must not have active write transactions.

How large will my DuckDB backups be?#

DuckDB uses good compression internally. EXPORT DATABASE to Parquet typically compresses 3-5x. A 1GB DuckDB file may produce 200-400MB of Parquet backups.

How do I restore a backup to a new machine?#

Copy the .duckdb file to the same path on the new machine and open it. For EXPORT DATABASE backups, run IMPORT DATABASE '/path/to/backup/' on any DuckDB instance.

Does DuckDB support incremental backups?#

Not natively. The WAL acts as an incremental log within a session, but there's no built-in incremental backup between sessions. Use Parquet exports and sync only changed files as an approximation.

What's the best backup format for long-term archival?#

EXPORT DATABASE with Parquet format. Parquet is an open standard, can be read without DuckDB, and compresses well. The .duckdb binary format is tied to the DuckDB version.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA