Back to The Times of Claw

DuckDB for Time-Series Analysis

DuckDB handles time-series data natively with powerful date functions, window queries, and fast columnar storage. Here's how to use it for business analytics.

Mark Rachapoom
Mark Rachapoom
·7 min read
DuckDB for Time-Series Analysis

DuckDB for Time-Series Analysis

DuckDB handles time-series data exceptionally well. Its columnar storage format is ideal for the kind of sequential, timestamp-ordered data that drives business metrics — revenue trends, user activity, pipeline velocity, churn rates. And it does it all without requiring a separate time-series database.

This guide covers the practical patterns for time-series analysis with DuckDB: date arithmetic, window functions, aggregations, and visualization-ready output.

Why DuckDB Works Well for Time-Series#

Most OLTP databases (PostgreSQL, MySQL) store rows together on disk. DuckDB stores columns together. For time-series queries — "give me daily revenue for the last 90 days" — this means DuckDB only reads the timestamp and value columns, skipping everything else. On large datasets, this can be 10-100x faster than a row-oriented database.

DuckDB also ships with a rich set of date and interval functions, and its window function support is excellent. There's no need for a specialized time-series database for most business analytics workloads.

Data Setup#

Let's start with a realistic events table:

CREATE TABLE events (
    event_id VARCHAR,
    user_id VARCHAR,
    event_type VARCHAR,
    revenue DECIMAL(10,2),
    occurred_at TIMESTAMP
);

And a deals table as you'd find in a DenchClaw CRM:

CREATE TABLE deals (
    deal_id VARCHAR,
    company_id VARCHAR,
    stage VARCHAR,
    value DECIMAL(10,2),
    created_at DATE,
    closed_at DATE
);

Basic Date Operations#

Extracting Date Parts#

SELECT 
    YEAR(occurred_at) AS year,
    MONTH(occurred_at) AS month,
    DAY(occurred_at) AS day,
    DAYOFWEEK(occurred_at) AS day_of_week,
    HOUR(occurred_at) AS hour
FROM events
LIMIT 10;

Truncating to Periods#

The DATE_TRUNC function is essential for grouping by time periods:

SELECT 
    DATE_TRUNC('day', occurred_at) AS day,
    COUNT(*) AS event_count,
    SUM(revenue) AS daily_revenue
FROM events
WHERE occurred_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY day
ORDER BY day;

Supported truncation periods: microsecond, millisecond, second, minute, hour, day, week, month, quarter, year.

Date Arithmetic#

-- Events in the last 7 days
SELECT * FROM events
WHERE occurred_at > NOW() - INTERVAL '7 days';
 
-- Events between two dates
SELECT * FROM events
WHERE occurred_at BETWEEN '2026-01-01' AND '2026-03-31';
 
-- Days since each deal was created
SELECT deal_id, DATEDIFF('day', created_at, CURRENT_DATE) AS age_days
FROM deals;

Time-Series Aggregations#

Daily Revenue Trend#

SELECT 
    DATE_TRUNC('day', occurred_at) AS date,
    COUNT(*) AS transactions,
    SUM(revenue) AS revenue,
    AVG(revenue) AS avg_transaction_value
FROM events
WHERE event_type = 'purchase'
    AND occurred_at >= CURRENT_DATE - INTERVAL '90 days'
GROUP BY date
ORDER BY date;

Weekly Cohort Analysis#

SELECT 
    DATE_TRUNC('week', first_seen) AS cohort_week,
    DATE_TRUNC('week', occurred_at) AS activity_week,
    DATEDIFF('week', first_seen, occurred_at) AS weeks_since_signup,
    COUNT(DISTINCT user_id) AS active_users
FROM events
JOIN (
    SELECT user_id, MIN(occurred_at) AS first_seen FROM events GROUP BY user_id
) AS cohorts USING (user_id)
GROUP BY cohort_week, activity_week, weeks_since_signup
ORDER BY cohort_week, weeks_since_signup;

Month-over-Month Growth#

WITH monthly AS (
    SELECT 
        DATE_TRUNC('month', occurred_at) AS month,
        SUM(revenue) AS revenue
    FROM events
    WHERE event_type = 'purchase'
    GROUP BY month
)
SELECT 
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
    ROUND(
        (revenue - LAG(revenue) OVER (ORDER BY month)) / 
        LAG(revenue) OVER (ORDER BY month) * 100, 
        1
    ) AS mom_growth_pct
FROM monthly
ORDER BY month;

Window Functions for Time-Series#

Window functions let you compute running totals, moving averages, and rankings across ordered rows.

Running Total#

SELECT 
    DATE_TRUNC('day', occurred_at) AS date,
    SUM(revenue) AS daily_revenue,
    SUM(SUM(revenue)) OVER (ORDER BY DATE_TRUNC('day', occurred_at)) AS cumulative_revenue
FROM events
GROUP BY date
ORDER BY date;

7-Day Moving Average#

WITH daily AS (
    SELECT 
        DATE_TRUNC('day', occurred_at) AS date,
        SUM(revenue) AS daily_revenue
    FROM events
    GROUP BY date
)
SELECT 
    date,
    daily_revenue,
    ROUND(AVG(daily_revenue) OVER (
        ORDER BY date 
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ), 2) AS rolling_7d_avg
FROM daily
ORDER BY date;

Percentile Over Time#

SELECT 
    DATE_TRUNC('week', occurred_at) AS week,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) AS median_deal_value,
    PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY revenue) AS p90_deal_value
FROM events
WHERE event_type = 'purchase'
GROUP BY week
ORDER BY week;

Filling Gaps in Time-Series#

A common problem: some days have no data, creating gaps in your charts. Here's how to fill them:

-- Generate a series of dates
WITH date_series AS (
    SELECT generate_series(
        DATE '2026-01-01',
        DATE '2026-03-31',
        INTERVAL '1 day'
    )::DATE AS date
),
daily_revenue AS (
    SELECT 
        DATE_TRUNC('day', occurred_at)::DATE AS date,
        SUM(revenue) AS revenue
    FROM events
    GROUP BY date
)
SELECT 
    ds.date,
    COALESCE(dr.revenue, 0) AS revenue
FROM date_series ds
LEFT JOIN daily_revenue dr ON ds.date = dr.date
ORDER BY ds.date;

generate_series is a DuckDB built-in that produces a sequence of values — integers, dates, or timestamps.

Time-Series for Pipeline Analytics#

In a sales CRM, time-series analysis on your deal pipeline is one of the most valuable things you can do. Here's a pattern for pipeline snapshot analysis:

-- Pipeline value by stage over time (snapshot-based)
WITH pipeline_snapshots AS (
    SELECT 
        DATE_TRUNC('week', created_at) AS snapshot_week,
        stage,
        COUNT(*) AS deal_count,
        SUM(value) AS pipeline_value
    FROM deals
    WHERE closed_at IS NULL OR closed_at > snapshot_week
    GROUP BY snapshot_week, stage
)
SELECT 
    snapshot_week,
    stage,
    deal_count,
    pipeline_value,
    SUM(pipeline_value) OVER (PARTITION BY snapshot_week) AS total_pipeline
FROM pipeline_snapshots
ORDER BY snapshot_week, stage;

Deal Velocity#

-- How long deals spend in each stage
SELECT 
    stage,
    AVG(DATEDIFF('day', stage_entered_at, COALESCE(stage_exited_at, CURRENT_DATE))) AS avg_days_in_stage,
    PERCENTILE_CONT(0.5) WITHIN GROUP (
        ORDER BY DATEDIFF('day', stage_entered_at, COALESCE(stage_exited_at, CURRENT_DATE))
    ) AS median_days
FROM deal_stage_history
GROUP BY stage
ORDER BY avg_days_in_stage DESC;

Comparing DuckDB to TimescaleDB and InfluxDB#

FeatureDuckDBTimescaleDBInfluxDB
SetupEmbeddedPostgreSQL extensionSeparate server
Query languageSQLSQL + hypertable functionsFlux / InfluxQL
CompressionColumnarTime-series chunksTime-series chunks
Streaming ingestNoYesYes
SQL compatibilityFullFullLimited
Best forAnalytical queriesHigh-frequency writesIoT/metrics
CostFreeFree/paid tiersFree/paid tiers

DuckDB wins for batch analytics on historical data. TimescaleDB and InfluxDB win when you need high-frequency writes (>10K rows/sec) and streaming ingest.

For business analytics — daily/weekly aggregations on sales, marketing, and product data — DuckDB is the right choice and requires zero infrastructure.

Exporting for Visualization#

-- Export time-series data to CSV for Grafana or a BI tool
COPY (
    SELECT 
        DATE_TRUNC('day', occurred_at) AS date,
        SUM(revenue) AS revenue
    FROM events
    GROUP BY date
    ORDER BY date
) TO 'daily_revenue.csv' (FORMAT CSV, HEADER TRUE);

Or use DuckDB for business patterns to power dashboards directly from query results.

Frequently Asked Questions#

Can DuckDB handle real-time time-series data?#

DuckDB is designed for analytical (OLAP) workloads, not high-frequency writes. It's not suitable as a streaming time-series store. Use it for batch analytics on data you've already collected.

How do I handle timezones in DuckDB?#

Use AT TIME ZONE for conversions: SELECT occurred_at AT TIME ZONE 'America/Los_Angeles' FROM events. DuckDB uses the IANA timezone database.

Is DuckDB fast enough for large time-series datasets?#

On a modern laptop, DuckDB handles 100-500 million row aggregations in seconds. For datasets that don't fit in RAM, DuckDB spills to disk but remains fast due to columnar compression.

How do I do lag/lead comparisons?#

Use the LAG() and LEAD() window functions with an ORDER BY clause on your timestamp column. See the month-over-month example above.

Can I connect DuckDB time-series data to Grafana?#

Yes, via the DuckDB JDBC driver or by exporting to Parquet/CSV and loading into Grafana's file data source. For live dashboards, expose DuckDB via a REST API wrapper.

Ready to try DenchClaw? Install in one command: npx denchclaw. Full setup guide →

Mark Rachapoom

Written by

Mark Rachapoom

Building the future of AI CRM software.

Continue reading

DENCH

© 2026 DenchHQ · San Francisco, CA