Open Data Formats for CRM: Why They Matter
Open data formats for CRM systems prevent vendor lock-in and ensure long-term data access. Compare CSV, JSON, SQLite, DuckDB, and what to look for in your CRM.
Open Data Formats for CRM: Why They Matter
Open data formats for CRM matter because they determine whether you can actually use your own data — now, after a vendor switch, and 10 years from now. A CRM that stores your data in a proprietary binary format or behind a locked API is a CRM that owns your customer relationships, not you. This guide compares the formats in use today and explains what to actually look for.
What Makes a Data Format "Open"?#
An open data format meets three criteria:
- Documented specification — the format is fully documented and publicly available, so any developer can build a tool to read or write it without permission
- Free to implement — no license fees, no patents blocking independent implementations
- Long-term stability — the format has a realistic path to remaining readable in 5–10 years
By that definition: CSV is open. JSON is open. SQLite is open. DuckDB is open. Salesforce's internal data model is not open. HubSpot's export format — a CSV with their specific field names — is technically open at the format level but not at the schema level.
The distinction matters. You can read a HubSpot CSV export with any spreadsheet. But you can't query it efficiently, restore relational integrity from it, or import it cleanly into a different system without significant manual work.
Format Comparison: What's Actually in Use#
CSV (Comma-Separated Values)#
Pros: Universal. Every tool reads it. Easy to inspect in a text editor or spreadsheet.
Cons: Flat. No relationships. No types (everything is a string). No schema. Terrible for anything beyond simple contact lists.
CRM use case: Fine as a one-time export for contacts. Insufficient for deals, activities, or anything relational.
Verdict: Necessary floor, insufficient ceiling.
JSON (JavaScript Object Notation)#
Pros: Supports nested structures. Human-readable. Native to web APIs. Can represent relationships via embedded objects or reference IDs.
Cons: No native schema enforcement. Gets bulky at scale. Requires parsing — not directly queryable without tooling.
CRM use case: Good for API-based exports with relationship context preserved. Most modern CRM APIs return JSON.
Example of what good JSON export looks like:
{
"contact": {
"id": "cid_12345",
"name": "Sarah Chen",
"email": "sarah@acme.com",
"created_at": "2024-01-15T10:30:00Z",
"company": {
"id": "co_789",
"name": "Acme Corp",
"domain": "acme.com"
},
"deals": ["deal_001", "deal_002"]
}
}Verdict: Good for transport. Not great for long-term storage or direct querying.
SQLite#
Pros: A real relational database in a single file. Fully queryable with SQL. ACID compliant. Widely supported — billions of devices run SQLite. Stable format: SQLite commits to backward compatibility indefinitely.
Cons: Single-writer limitation at high concurrency. Not designed for analytics workloads.
CRM use case: Excellent for local-first applications. The entire database — contacts, deals, activities, custom fields — lives in one .sqlite file you can copy, inspect, and query with any SQLite client.
Verdict: Strong choice for local-first CRM storage.
DuckDB#
Pros: SQL-native analytical database in a single file. Extremely fast for analytical queries (aggregations, pivots, joins across large tables). Open source (MIT). Supports Parquet, CSV, and JSON as direct query targets. Growing ecosystem.
Cons: Newer than SQLite — less ubiquitous, though rapidly becoming standard in the data community. Write concurrency limitations similar to SQLite.
CRM use case: DenchClaw uses DuckDB as its underlying storage. This gives you SQL query access to your entire CRM from the terminal, fast analytics over historical data, and a file you can copy anywhere.
# Query your DenchClaw data directly from the command line
duckdb ~/.openclaw-dench/workspace/workspace.duckdb
> SELECT name, email, company FROM v_contacts WHERE status = 'Lead' ORDER BY created_at DESC;Verdict: Best choice for analytical-heavy local-first CRM. Format is stable, documented, and open.
Parquet#
Pros: Column-oriented format optimized for analytics. Excellent compression. Widely adopted in data engineering (Apache ecosystem, Snowflake, BigQuery all support it natively).
Cons: Not human-readable without tooling. Optimized for read-heavy analytical workloads, not transactional writes.
CRM use case: Excellent for exporting and archiving large volumes of CRM data for analysis. Not suitable as a primary operational database format.
Verdict: Best for analytics exports and long-term archival.
Proprietary Formats#
Examples: Salesforce's Apex data model, HubSpot's internal object schema, Pipedrive's database structure.
Cons: You can't read these without the vendor's software. They change without notice. They give you no insight into how your data is actually stored.
Verdict: Vendor lock-in by design. Avoid as your primary data container.
What to Ask Any CRM Vendor#
Before signing up — or before renewing — ask these questions:
1. What format is my data stored in internally? A vendor willing to answer this question clearly is a vendor that respects your data ownership. Evasiveness here is a red flag.
2. Can I export everything — all objects, all fields, all relationships — in a single operation? The answer should be yes. If they say "contacts and companies, yes, but deals require a separate export and activities aren't included," that's a problem.
3. What happens to my data if I cancel? 30 days minimum for post-cancellation export access. 90 days is better. "Immediately deleted" is unacceptable.
4. Is there an API I can use to extract data programmatically, with rate limits appropriate for bulk migration? Yes/no question. If yes: what are the rate limits? Is API access available after cancellation? Are there additional fees?
5. Are your export formats documented anywhere? A simple docs link is the right answer. If they have to escalate to engineering to answer this, the export is probably an afterthought.
How DenchClaw Handles Data Formats#
DenchClaw's architecture is built around open formats from the ground up:
- Primary storage: DuckDB
.duckdbfile on your local machine - Schema: EAV (Entity-Attribute-Value) with typed pivot views — fully documented and queryable
- Import formats: Standard CSV exports from Salesforce, HubSpot, Pipedrive
- Export: Direct DuckDB file access, plus CSV export for any view
- Backup: Copy the file. That's it.
There's no export button that might be broken. There's no API rate limit preventing bulk reads. There's no 30-day post-cancellation window to worry about. The file is on your machine and it's yours.
This is what it looks like when data format choices are made in favor of the user rather than the vendor.
Practical Steps to Assess Your Current CRM#
Here's a quick audit you can run in an afternoon:
Step 1: Run a test export right now Don't wait until you need it. Export all your contacts, all your deals, and a sample of your activities. Inspect the output. Do relationships survive? Are custom fields included? Are timestamps present?
Step 2: Count what you get back Compare record counts between what's in the system and what came out of the export. Discrepancies greater than 1% warrant investigation.
Step 3: Try to import the export into something else Even if you're not planning to migrate, attempt an import into a test system. You'll immediately discover what breaks. Better to know now.
Step 4: Check API documentation Does your vendor have documented bulk export endpoints? What are the rate limits? Is there a way to get a full database dump vs. paginated API calls?
Step 5: Calculate your migration cost Based on your data volume, the export format quality, and available import tools — how long would a migration actually take? This number is your vendor's real leverage over you.
Frequently Asked Questions#
Is CSV really that bad for CRM data? For a simple contact list, CSV is fine. For a CRM with deals, activities, custom fields, and relationship history, CSV is severely inadequate. The flat structure means you either get separate files per object (and have to re-link them yourself) or lose relational context entirely.
Why don't more CRMs use SQLite or DuckDB? Most cloud CRMs use PostgreSQL or MySQL on their servers — reasonable choices for multi-tenant cloud systems. The issue is what they expose to you. They could export a full SQL dump or provide DuckDB-compatible exports; they choose not to because lock-in is valuable.
Can I query DuckDB files from other tools? Yes. DuckDB has client libraries for Python, R, Java, Node.js, and many others. You can query a DenchClaw database directly from a Jupyter notebook, a Python script, or any BI tool with a DuckDB connector.
What about encryption and security for local files? DuckDB files can be encrypted at rest using filesystem-level encryption (FileVault on Mac, BitLocker on Windows, LUKS on Linux). DenchClaw recommends enabling full-disk encryption on any machine running the CRM.
If I'm already locked into Salesforce, what's the fastest path out? Start with a full API export using Salesforce Data Loader or the Bulk API. Export all objects with relationship IDs preserved. Target a local-first system like DenchClaw for import. Run parallel for 30 days before cutting over. The setup guide covers the import process in detail.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
