DuckDB for Time-Series Analysis
DuckDB handles time-series data natively with powerful date functions, window queries, and fast columnar storage. Here's how to use it for business analytics.
DuckDB for Time-Series Analysis
DuckDB handles time-series data exceptionally well. Its columnar storage format is ideal for the kind of sequential, timestamp-ordered data that drives business metrics — revenue trends, user activity, pipeline velocity, churn rates. And it does it all without requiring a separate time-series database.
This guide covers the practical patterns for time-series analysis with DuckDB: date arithmetic, window functions, aggregations, and visualization-ready output.
Why DuckDB Works Well for Time-Series#
Most OLTP databases (PostgreSQL, MySQL) store rows together on disk. DuckDB stores columns together. For time-series queries — "give me daily revenue for the last 90 days" — this means DuckDB only reads the timestamp and value columns, skipping everything else. On large datasets, this can be 10-100x faster than a row-oriented database.
DuckDB also ships with a rich set of date and interval functions, and its window function support is excellent. There's no need for a specialized time-series database for most business analytics workloads.
Data Setup#
Let's start with a realistic events table:
CREATE TABLE events (
event_id VARCHAR,
user_id VARCHAR,
event_type VARCHAR,
revenue DECIMAL(10,2),
occurred_at TIMESTAMP
);And a deals table as you'd find in a DenchClaw CRM:
CREATE TABLE deals (
deal_id VARCHAR,
company_id VARCHAR,
stage VARCHAR,
value DECIMAL(10,2),
created_at DATE,
closed_at DATE
);Basic Date Operations#
Extracting Date Parts#
SELECT
YEAR(occurred_at) AS year,
MONTH(occurred_at) AS month,
DAY(occurred_at) AS day,
DAYOFWEEK(occurred_at) AS day_of_week,
HOUR(occurred_at) AS hour
FROM events
LIMIT 10;Truncating to Periods#
The DATE_TRUNC function is essential for grouping by time periods:
SELECT
DATE_TRUNC('day', occurred_at) AS day,
COUNT(*) AS event_count,
SUM(revenue) AS daily_revenue
FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY day
ORDER BY day;Supported truncation periods: microsecond, millisecond, second, minute, hour, day, week, month, quarter, year.
Date Arithmetic#
-- Events in the last 7 days
SELECT * FROM events
WHERE occurred_at > NOW() - INTERVAL '7 days';
-- Events between two dates
SELECT * FROM events
WHERE occurred_at BETWEEN '2026-01-01' AND '2026-03-31';
-- Days since each deal was created
SELECT deal_id, DATEDIFF('day', created_at, CURRENT_DATE) AS age_days
FROM deals;Time-Series Aggregations#
Daily Revenue Trend#
SELECT
DATE_TRUNC('day', occurred_at) AS date,
COUNT(*) AS transactions,
SUM(revenue) AS revenue,
AVG(revenue) AS avg_transaction_value
FROM events
WHERE event_type = 'purchase'
AND occurred_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date
ORDER BY date;Weekly Cohort Analysis#
SELECT
DATE_TRUNC('week', first_seen) AS cohort_week,
DATE_TRUNC('week', occurred_at) AS activity_week,
DATEDIFF('week', first_seen, occurred_at) AS weeks_since_signup,
COUNT(DISTINCT user_id) AS active_users
FROM events
JOIN (
SELECT user_id, MIN(occurred_at) AS first_seen FROM events GROUP BY user_id
) AS cohorts USING (user_id)
GROUP BY cohort_week, activity_week, weeks_since_signup
ORDER BY cohort_week, weeks_since_signup;Month-over-Month Growth#
WITH monthly AS (
SELECT
DATE_TRUNC('month', occurred_at) AS month,
SUM(revenue) AS revenue
FROM events
WHERE event_type = 'purchase'
GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month)) /
LAG(revenue) OVER (ORDER BY month) * 100,
1
) AS mom_growth_pct
FROM monthly
ORDER BY month;Window Functions for Time-Series#
Window functions let you compute running totals, moving averages, and rankings across ordered rows.
Running Total#
SELECT
DATE_TRUNC('day', occurred_at) AS date,
SUM(revenue) AS daily_revenue,
SUM(SUM(revenue)) OVER (ORDER BY DATE_TRUNC('day', occurred_at)) AS cumulative_revenue
FROM events
GROUP BY date
ORDER BY date;7-Day Moving Average#
WITH daily AS (
SELECT
DATE_TRUNC('day', occurred_at) AS date,
SUM(revenue) AS daily_revenue
FROM events
GROUP BY date
)
SELECT
date,
daily_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS rolling_7d_avg
FROM daily
ORDER BY date;Percentile Over Time#
SELECT
DATE_TRUNC('week', occurred_at) AS week,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) AS median_deal_value,
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY revenue) AS p90_deal_value
FROM events
WHERE event_type = 'purchase'
GROUP BY week
ORDER BY week;Filling Gaps in Time-Series#
A common problem: some days have no data, creating gaps in your charts. Here's how to fill them:
-- Generate a series of dates
WITH date_series AS (
SELECT generate_series(
DATE '2026-01-01',
DATE '2026-03-31',
INTERVAL '1 day'
)::DATE AS date
),
daily_revenue AS (
SELECT
DATE_TRUNC('day', occurred_at)::DATE AS date,
SUM(revenue) AS revenue
FROM events
GROUP BY date
)
SELECT
ds.date,
COALESCE(dr.revenue, 0) AS revenue
FROM date_series ds
LEFT JOIN daily_revenue dr ON ds.date = dr.date
ORDER BY ds.date;generate_series is a DuckDB built-in that produces a sequence of values — integers, dates, or timestamps.
Time-Series for Pipeline Analytics#
In a sales CRM, time-series analysis on your deal pipeline is one of the most valuable things you can do. Here's a pattern for pipeline snapshot analysis:
-- Pipeline value by stage over time (snapshot-based)
WITH pipeline_snapshots AS (
SELECT
DATE_TRUNC('week', created_at) AS snapshot_week,
stage,
COUNT(*) AS deal_count,
SUM(value) AS pipeline_value
FROM deals
WHERE closed_at IS NULL OR closed_at > snapshot_week
GROUP BY snapshot_week, stage
)
SELECT
snapshot_week,
stage,
deal_count,
pipeline_value,
SUM(pipeline_value) OVER (PARTITION BY snapshot_week) AS total_pipeline
FROM pipeline_snapshots
ORDER BY snapshot_week, stage;Deal Velocity#
-- How long deals spend in each stage
SELECT
stage,
AVG(DATEDIFF('day', stage_entered_at, COALESCE(stage_exited_at, CURRENT_DATE))) AS avg_days_in_stage,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY DATEDIFF('day', stage_entered_at, COALESCE(stage_exited_at, CURRENT_DATE))
) AS median_days
FROM deal_stage_history
GROUP BY stage
ORDER BY avg_days_in_stage DESC;Comparing DuckDB to TimescaleDB and InfluxDB#
| Feature | DuckDB | TimescaleDB | InfluxDB |
|---|---|---|---|
| Setup | Embedded | PostgreSQL extension | Separate server |
| Query language | SQL | SQL + hypertable functions | Flux / InfluxQL |
| Compression | Columnar | Time-series chunks | Time-series chunks |
| Streaming ingest | No | Yes | Yes |
| SQL compatibility | Full | Full | Limited |
| Best for | Analytical queries | High-frequency writes | IoT/metrics |
| Cost | Free | Free/paid tiers | Free/paid tiers |
DuckDB wins for batch analytics on historical data. TimescaleDB and InfluxDB win when you need high-frequency writes (>10K rows/sec) and streaming ingest.
For business analytics — daily/weekly aggregations on sales, marketing, and product data — DuckDB is the right choice and requires zero infrastructure.
Exporting for Visualization#
-- Export time-series data to CSV for Grafana or a BI tool
COPY (
SELECT
DATE_TRUNC('day', occurred_at) AS date,
SUM(revenue) AS revenue
FROM events
GROUP BY date
ORDER BY date
) TO 'daily_revenue.csv' (FORMAT CSV, HEADER TRUE);Or use DuckDB for business patterns to power dashboards directly from query results.
Frequently Asked Questions#
Can DuckDB handle real-time time-series data?#
DuckDB is designed for analytical (OLAP) workloads, not high-frequency writes. It's not suitable as a streaming time-series store. Use it for batch analytics on data you've already collected.
How do I handle timezones in DuckDB?#
Use AT TIME ZONE for conversions: SELECT occurred_at AT TIME ZONE 'America/Los_Angeles' FROM events. DuckDB uses the IANA timezone database.
Is DuckDB fast enough for large time-series datasets?#
On a modern laptop, DuckDB handles 100-500 million row aggregations in seconds. For datasets that don't fit in RAM, DuckDB spills to disk but remains fast due to columnar compression.
How do I do lag/lead comparisons?#
Use the LAG() and LEAD() window functions with an ORDER BY clause on your timestamp column. See the month-over-month example above.
Can I connect DuckDB time-series data to Grafana?#
Yes, via the DuckDB JDBC driver or by exporting to Parquet/CSV and loading into Grafana's file data source. For live dashboards, expose DuckDB via a REST API wrapper.
Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →
