Window functions are the single biggest productivity multiplier for analysts working in SQL. They let you perform calculations across a set of rows that are somehow related to the current row — without collapsing the result set the way GROUP BY does. If you're still writing self-joins or correlated subqueries for ranking, running totals, or period-over-period comparisons, this guide will change how you write SQL.
The Anatomy of a Window Function
Every window function follows the same structure:
function_name(column) OVER (
[PARTITION BY partition_column(s)]
[ORDER BY order_column(s)]
[ROWS/RANGE BETWEEN frame_start AND frame_end]
)
The OVER clause is what makes it a window function. Without it, you'd have a regular aggregate. With it, you get the aggregate computed for each row within a defined "window" of related rows.
Ranking Functions
ROW_NUMBER, RANK, and DENSE_RANK
These three are frequently confused. Here's the definitive breakdown:
SELECT
customer_id,
order_date,
total_amount,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS row_num,
RANK() OVER (PARTITION BY customer_id ORDER BY total_amount DESC) AS amount_rank,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY total_amount DESC) AS amount_dense_rank
FROM orders;
ROW_NUMBER() when you need a unique sequential number per partition (e.g., deduplication). Use RANK() when ties should get the same rank with gaps after (like Olympic medals). Use DENSE_RANK() when ties get the same rank without gaps.
Real Use Case: Deduplication with ROW_NUMBER
One of the most common uses of ROW_NUMBER() is deduplicating records where you want to keep the most recent version of each entity:
-- Keep the latest record per customer_id
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY updated_at DESC
) AS rn
FROM customers_raw
)
SELECT * EXCLUDE rn
FROM ranked
WHERE rn = 1;
Offset Functions: LAG and LEAD
LAG and LEAD let you access values from other rows relative to the current row, without a self-join. They're essential for period-over-period comparisons.
-- Month-over-month revenue change
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
2
) AS mom_pct_change
FROM monthly_revenue
ORDER BY month;
Real Use Case: Customer Churn Signals
-- Identify customers whose order gap is growing
SELECT
customer_id,
order_date,
LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_date,
DATEDIFF(order_date, LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)) AS days_since_last_order
FROM orders
QUALIFY days_since_last_order > 90 -- BigQuery/Snowflake QUALIFY clause
ORDER BY days_since_last_order DESC;
Aggregate Window Functions: Running Totals & Moving Averages
Running totals
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date) AS cumulative_revenue,
SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7d_revenue
FROM daily_revenue_summary;
Moving Averages
-- 30-day moving average (handling variable window at start of series)
SELECT
date,
daily_active_users,
AVG(daily_active_users) OVER (
ORDER BY date
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
) AS rolling_30d_avg_dau
FROM daily_metrics
ORDER BY date;
NTILE: Bucketing Rows into Percentiles
NTILE(n) divides rows into n roughly equal buckets. It's perfect for customer segmentation and RFM analysis:
-- RFM Customer Segmentation
WITH rfm_scores AS (
SELECT
customer_id,
MAX(order_date) AS last_order_date,
COUNT(order_id) AS frequency,
SUM(total_amount) AS monetary,
NTILE(5) OVER (ORDER BY MAX(order_date) DESC) AS recency_score,
NTILE(5) OVER (ORDER BY COUNT(order_id) DESC) AS frequency_score,
NTILE(5) OVER (ORDER BY SUM(total_amount) DESC) AS monetary_score
FROM orders
GROUP BY customer_id
)
SELECT
customer_id,
recency_score,
frequency_score,
monetary_score,
(recency_score + frequency_score + monetary_score) AS rfm_total,
CASE
WHEN recency_score >= 4 AND frequency_score >= 4 THEN 'Champions'
WHEN recency_score >= 3 AND monetary_score >= 4 THEN 'High Value At Risk'
WHEN recency_score <= 2 AND frequency_score >= 3 THEN 'Churned Loyalists'
ELSE 'Standard'
END AS customer_segment
FROM rfm_scores;
First/Last Value and Cohort Analysis
-- Cohort analysis: revenue per cohort month
WITH first_orders AS (
SELECT
customer_id,
DATE_TRUNC('month', MIN(order_date)) AS cohort_month
FROM orders
GROUP BY customer_id
),
cohort_data AS (
SELECT
f.cohort_month,
DATE_TRUNC('month', o.order_date) AS order_month,
COUNT(DISTINCT o.customer_id) AS customers,
SUM(o.total_amount) AS revenue
FROM orders o
JOIN first_orders f USING (customer_id)
GROUP BY 1, 2
)
SELECT
cohort_month,
order_month,
DATEDIFF('month', cohort_month, order_month) AS month_number,
customers,
FIRST_VALUE(customers) OVER (
PARTITION BY cohort_month
ORDER BY order_month
) AS cohort_size,
ROUND(customers::float / FIRST_VALUE(customers) OVER (
PARTITION BY cohort_month ORDER BY order_month
) * 100, 1) AS retention_rate
FROM cohort_data
ORDER BY cohort_month, order_month;
Performance Tips
- Partition wisely — large partition keys spill to disk. Make sure partition columns are indexed or clustered.
- Reuse window specs — in BigQuery and Snowflake you can name window specs with the
WINDOWclause to avoid repetition. - Combine into one pass — multiple window functions with the same
OVERclause are computed in a single scan. - Use QUALIFY in Snowflake/BigQuery — filter on window function results without a subquery.
Wrapping Up
Window functions eliminate entire categories of painful SQL workarounds — self-joins, subqueries, and procedural code. Once you internalize the OVER clause, you'll find yourself reaching for these functions daily. The patterns above cover 90% of real-world analyst use cases. Start with the ones that solve your most immediate problems, and the rest will follow naturally.