DuckDB for Product Analytics: Track Users Without Sending Data to the Cloud
DuckDB handles product analytics locally — funnels, retention, DAU/WAU/MAU, cohort analysis — without sending user data to Mixpanel, Amplitude, or the cloud.
DuckDB for Product Analytics: Track Users Without Sending Data to the Cloud
Most product analytics tools — Mixpanel, Amplitude, Heap — work by shipping your user behavior data to their servers. You pay per event, per user, per month. Your most sensitive product data sits on someone else's infrastructure.
DuckDB lets you run the same analyses locally. Funnel analysis, retention cohorts, DAU/WAU/MAU, feature adoption — all in SQL, all on your machine, all free.
Setting Up Your Events Schema#
CREATE TABLE events (
event_id VARCHAR PRIMARY KEY DEFAULT gen_random_uuid(),
user_id VARCHAR NOT NULL,
session_id VARCHAR,
event_name VARCHAR NOT NULL,
properties JSON,
occurred_at TIMESTAMP DEFAULT NOW(),
platform VARCHAR, -- 'web', 'ios', 'android'
app_version VARCHAR
);
CREATE TABLE users (
user_id VARCHAR PRIMARY KEY,
email VARCHAR,
plan VARCHAR, -- 'free', 'pro', 'enterprise'
company VARCHAR,
created_at TIMESTAMP,
country VARCHAR,
is_churned BOOLEAN DEFAULT false
);Core Metrics#
Daily / Weekly / Monthly Active Users#
-- DAU
SELECT
DATE_TRUNC('day', occurred_at) AS date,
COUNT(DISTINCT user_id) AS dau
FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date
ORDER BY date;
-- WAU and MAU with rolling windows
SELECT
DATE_TRUNC('day', occurred_at) AS date,
COUNT(DISTINCT user_id) AS dau,
COUNT(DISTINCT user_id) OVER (
ORDER BY DATE_TRUNC('day', occurred_at)
RANGE BETWEEN INTERVAL '6 days' PRECEDING AND CURRENT ROW
) AS wau_7d,
COUNT(DISTINCT user_id) OVER (
ORDER BY DATE_TRUNC('day', occurred_at)
RANGE BETWEEN INTERVAL '29 days' PRECEDING AND CURRENT ROW
) AS mau_30d
FROM events
GROUP BY date
ORDER BY date;Stickiness (DAU/MAU)#
WITH daily AS (
SELECT DATE_TRUNC('day', occurred_at) AS date, COUNT(DISTINCT user_id) AS dau
FROM events GROUP BY date
),
monthly AS (
SELECT DATE_TRUNC('month', occurred_at) AS month, COUNT(DISTINCT user_id) AS mau
FROM events GROUP BY month
)
SELECT
d.date,
d.dau,
m.mau,
ROUND(d.dau * 100.0 / m.mau, 1) AS stickiness_pct
FROM daily d
JOIN monthly m ON DATE_TRUNC('month', d.date) = m.month
ORDER BY d.date;Funnel Analysis#
-- Signup → Activation → First Purchase funnel
WITH funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_name = 'signed_up' THEN 1 END) AS step_1,
MAX(CASE WHEN event_name = 'onboarding_completed' THEN 1 END) AS step_2,
MAX(CASE WHEN event_name = 'first_purchase' THEN 1 END) AS step_3
FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
)
SELECT
'Signed Up' AS step, COUNT(*) AS users, 100.0 AS conversion_pct
FROM funnel_steps WHERE step_1 = 1
UNION ALL
SELECT
'Onboarding Completed',
COUNT(*),
ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM funnel_steps WHERE step_1 = 1), 1)
FROM funnel_steps WHERE step_2 = 1
UNION ALL
SELECT
'First Purchase',
COUNT(*),
ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM funnel_steps WHERE step_1 = 1), 1)
FROM funnel_steps WHERE step_3 = 1;Retention Cohort Analysis#
WITH cohorts AS (
SELECT user_id, DATE_TRUNC('week', MIN(occurred_at)) AS cohort_week
FROM events WHERE event_name = 'signed_up'
GROUP BY user_id
),
activity AS (
SELECT
c.user_id,
c.cohort_week,
DATE_TRUNC('week', e.occurred_at) AS activity_week,
DATEDIFF('week', c.cohort_week, DATE_TRUNC('week', e.occurred_at)) AS weeks_since_signup
FROM events e
JOIN cohorts c ON e.user_id = c.user_id
),
cohort_sizes AS (
SELECT cohort_week, COUNT(DISTINCT user_id) AS cohort_size
FROM cohorts GROUP BY cohort_week
)
SELECT
a.cohort_week,
a.weeks_since_signup,
COUNT(DISTINCT a.user_id) AS retained_users,
cs.cohort_size,
ROUND(COUNT(DISTINCT a.user_id) * 100.0 / cs.cohort_size, 1) AS retention_pct
FROM activity a
JOIN cohort_sizes cs ON a.cohort_week = cs.cohort_week
GROUP BY a.cohort_week, a.weeks_since_signup, cs.cohort_size
ORDER BY a.cohort_week, a.weeks_since_signup;Feature Adoption#
-- Which features are users actually using?
SELECT
event_name AS feature,
COUNT(DISTINCT user_id) AS unique_users,
COUNT(*) AS total_events,
ROUND(COUNT(DISTINCT user_id) * 100.0 / (
SELECT COUNT(DISTINCT user_id) FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
), 1) AS adoption_pct
FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
AND event_name NOT IN ('page_view', 'session_start', 'session_end')
GROUP BY event_name
ORDER BY unique_users DESC;User Segmentation#
-- Segment users by engagement
WITH user_stats AS (
SELECT
user_id,
COUNT(*) AS event_count,
MAX(occurred_at) AS last_seen,
DATEDIFF('day', MAX(occurred_at), CURRENT_DATE) AS days_inactive
FROM events
GROUP BY user_id
)
SELECT
CASE
WHEN days_inactive <= 7 THEN 'Active'
WHEN days_inactive <= 30 THEN 'At Risk'
WHEN days_inactive <= 90 THEN 'Dormant'
ELSE 'Churned'
END AS segment,
COUNT(*) AS users,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct
FROM user_stats
GROUP BY segment
ORDER BY users DESC;DenchClaw for Product Analytics#
DenchClaw uses DuckDB as its database. If you're tracking events to DuckDB, the App Builder lets you turn these queries into live dashboards:
// DenchClaw app: product analytics dashboard
const dau = await dench.db.query(`
SELECT DATE_TRUNC('day', occurred_at) AS date, COUNT(DISTINCT user_id) AS dau
FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY date ORDER BY date
`);
new Chart(dauChart, {
type: 'line',
data: {
labels: dau.map(r => r.date),
datasets: [{ label: 'DAU', data: dau.map(r => r.dau) }]
}
});All your product data, all locally owned, all queryable with SQL.
Frequently Asked Questions#
Can DuckDB replace Mixpanel or Amplitude?#
For most early-stage startups, yes. You lose the pre-built UI and mobile SDKs, but gain full ownership of your data, zero monthly cost, and unlimited query flexibility.
How do I collect events into DuckDB from a web app?#
Build a lightweight events API endpoint that writes to DuckDB. Or batch events client-side and send them in bulk every 30 seconds.
What's the scale limit for DuckDB product analytics?#
Hundreds of millions of events on a modern laptop. Billions of events on a server. For truly massive scale (trillions of events), use ClickHouse.
How do I handle event schema changes?#
Use the properties JSON column for flexible attributes that change over time. Core dimensions (user_id, event_name, timestamp) stay in fixed columns.
Can I query DuckDB analytics from a mobile app?#
Via a REST API wrapper — expose DuckDB over HTTP and query it from any client.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
