Your CRM as a .duckdb File: What That Actually Means

DenchClaw stores all your CRM data in a single .duckdb file. Here's what that means for ownership, portability, performance, and privacy — and why it's the right choice.

Kumar Abhirup

March 18, 2026·7 min read

Your CRM as a .duckdb File: What That Actually Means

There's a single file at ~/.openclaw-dench/workspace/workspace.duckdb. Every contact, every company, every deal, every note in your DenchClaw CRM is in that file. All of it.

That's a deliberate architectural choice, and it has significant implications for how your data works — implications that most CRM users have never had to think about because their data has always lived in someone else's cloud.

Let me walk through what having your CRM as a .duckdb file actually means in practice.

What DuckDB Is#

DuckDB is an embedded analytical database. "Embedded" means it runs in-process — no separate server, no daemon, no Docker container. It's like SQLite, but optimized for analytical workloads rather than transactional ones.

DuckDB is:

Free and open source (MIT license)
Single-file: the entire database is one .duckdb file
Full SQL: window functions, CTEs, PIVOT, JSON functions, array operations
Fast for analytics: columnar storage, vectorized execution, optimized aggregations
Cross-platform: macOS, Linux, Windows, ARM, x86
Widely adopted: used at companies like Meta, Amazon, Snowflake

For a CRM workload — millions of rows, complex analytical queries, frequent reads, moderate writes — DuckDB is nearly ideal.

The Schema#

DuckDB stores your CRM data using an EAV (Entity-Attribute-Value) pattern:

-- Every "table" in your CRM is an object
CREATE TABLE objects (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name VARCHAR NOT NULL,        -- "people", "companies", "deals"
  icon VARCHAR,
  entry_count INTEGER DEFAULT 0,
  created_at TIMESTAMP
);
 
-- Every "column" is a field
CREATE TABLE fields (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  object_id UUID REFERENCES objects(id),
  name VARCHAR NOT NULL,        -- "Full Name", "Email", "Stage"
  type VARCHAR NOT NULL,        -- "text", "email", "enum", "relation"
  config JSON,                  -- type-specific config
  position INTEGER,
  created_at TIMESTAMP
);
 
-- Every "row" is an entry
CREATE TABLE entries (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  object_id UUID REFERENCES objects(id),
  created_at TIMESTAMP,
  updated_at TIMESTAMP
);
 
-- Every "cell value" is an entry_field
CREATE TABLE entry_fields (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  entry_id UUID REFERENCES entries(id),
  field_id UUID REFERENCES fields(id),
  value VARCHAR,
  created_at TIMESTAMP
);
 
-- Linked markdown documents
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  entry_id UUID REFERENCES entries(id),
  path VARCHAR NOT NULL,
  created_at TIMESTAMP
);

On top of this, DenchClaw generates PIVOT views — flat, readable tables that look like a traditional CRM:

-- Auto-generated view for the "people" object
CREATE VIEW v_people AS
PIVOT (
  SELECT e.id, ef.field_name, ef.value
  FROM entries e
  JOIN entry_fields_with_names ef ON ef.entry_id = e.id
  WHERE e.object_id = (SELECT id FROM objects WHERE name = 'people')
) ON field_name
USING FIRST(value);

The result: v_people looks exactly like a traditional SQL table, with one column per field.

What You Can Do With a .duckdb File#

This is where it gets powerful. Your CRM data is now queryable by anything that speaks DuckDB or SQL.

Direct Terminal Access#

# Open the DuckDB REPL
duckdb ~/.openclaw-dench/workspace/workspace.duckdb
 
# Count your leads
D SELECT COUNT(*) FROM v_people WHERE "Status" = 'Lead';
 
# Top deals by value
D SELECT "Title", "Value", "Stage", "Close Date"
  FROM v_deals
  ORDER BY CAST("Value" AS NUMERIC) DESC
  LIMIT 10;
 
# Contacts not reached in 30 days
D SELECT "Full Name", "Email", "Last Contacted"
  FROM v_people
  WHERE "Status" = 'Active'
    AND ("Last Contacted" < current_date - INTERVAL 30 DAYS
         OR "Last Contacted" IS NULL)
  ORDER BY "Last Contacted" ASC;

Python Analytics#

import duckdb
import pandas as pd
 
conn = duckdb.connect("~/.openclaw-dench/workspace/workspace.duckdb", read_only=True)
 
# Load into pandas
df = conn.execute("SELECT * FROM v_deals").df()
 
# Monthly deal velocity
df['close_date'] = pd.to_datetime(df['Close Date'])
monthly = df.groupby(df['close_date'].dt.to_period('M'))['Value'].agg(['count', 'sum'])
print(monthly)

Jupyter Notebooks#

Run your entire CRM analytics in Jupyter:

# %pip install duckdb
import duckdb
conn = duckdb.connect("~/.openclaw-dench/workspace/workspace.duckdb")
 
# Pipeline funnel analysis
funnel = conn.execute("""
  SELECT "Stage", COUNT(*) as count, SUM(CAST("Value" AS NUMERIC)) as total
  FROM v_deals
  GROUP BY "Stage"
  ORDER BY CASE "Stage"
    WHEN 'Prospect' THEN 1
    WHEN 'Qualified' THEN 2
    WHEN 'Proposal Sent' THEN 3
    WHEN 'Closed Won' THEN 4
    ELSE 5 END
""").df()
 
import plotly.express as px
px.bar(funnel, x='Stage', y='total', title='Pipeline by Stage')

Export to Any Format#

-- Export contacts to CSV
COPY v_people TO 'people.csv' (HEADER, DELIMITER ',');
 
-- Export deals to Parquet
COPY v_deals TO 'deals.parquet' (FORMAT PARQUET);
 
-- Export everything to JSON
COPY (SELECT * FROM v_companies) TO 'companies.json' (FORMAT JSON);

You can take these files and import them into any system: Salesforce, HubSpot, Airtable, Google Sheets, whatever.

Performance#

DuckDB is fast. Surprisingly fast.

For typical CRM workloads:

10,000 contacts: queries run in single-digit milliseconds
100,000 contacts: queries run in tens of milliseconds
1,000,000 entries: analytical aggregations still under 1 second

The AI agent's natural language → SQL → response loop typically completes in 2-4 seconds total — 1-2s for the AI model call, sub-100ms for the SQL execution.

DenchClaw runs PIVOT views on demand for smaller objects and pre-generates them for larger ones. You can force regeneration:

# Regenerate all PIVOT views
openclaw db refresh-views --profile dench

Backup Strategy#

One file = one backup target. It's simple:

# Time Machine backs it up automatically (if workspace is on your main drive)
 
# Manual backup
cp ~/.openclaw-dench/workspace/workspace.duckdb ~/Desktop/crm-backup-$(date +%Y%m%d).duckdb
 
# Automated daily backup
# Add to crontab: 0 2 * * * cp ~/.openclaw-dench/workspace/workspace.duckdb /backups/crm-$(date +\%Y\%m\%d).duckdb
 
# Git-based backup (includes schema + docs)
cd ~/.openclaw-dench/workspace
git add workspace.duckdb && git commit -m "crm backup $(date)"
git push origin main

For recovery: copy the .duckdb file back, start DenchClaw, everything is there.

The Portability Guarantee#

The .duckdb file is fully self-contained. To move DenchClaw to a new machine:

# On old machine
tar czf denchclaw-workspace.tar.gz ~/.openclaw-dench/workspace/
 
# Transfer to new machine
scp denchclaw-workspace.tar.gz newmachine:~
 
# On new machine
npx denchclaw  # installs fresh
tar xzf denchclaw-workspace.tar.gz  # restore workspace
# DenchClaw is back exactly as you left it

The Vendor Risk Argument#

I've thought a lot about vendor risk since building DenchClaw. Here's the core argument for local-first data:

HubSpot could change their pricing tomorrow. (They have, repeatedly.) If your data is in their cloud and they raise prices 3x, you have no leverage. You can try to export, but exports are often incomplete — no activity history, no custom field types, lossy formatting.

SaaS companies get acquired. When the acquirer kills the product (it happens), you have a 90-day notice window to scramble your export. If the export tools are broken (they often are), you lose data.

SaaS companies get breached. Your contacts, your deal notes, your email history — all exposed. Not your fault, not your decision.

With a .duckdb file: