SQL Window Functions: The Complete Analyst's Playbook

Window functions are the single biggest productivity multiplier for analysts working in SQL. They let you perform calculations across a set of rows that are somehow related to the current row — without collapsing the result set the way GROUP BY does. If you're still writing self-joins or correlated subqueries for ranking, running totals, or period-over-period comparisons, this guide will change how you write SQL.

The Anatomy of a Window Function

Every window function follows the same structure:

function_name(column) OVER (
  [PARTITION BY partition_column(s)]
  [ORDER BY order_column(s)]
  [ROWS/RANGE BETWEEN frame_start AND frame_end]
)

The OVER clause is what makes it a window function. Without it, you'd have a regular aggregate. With it, you get the aggregate computed for each row within a defined "window" of related rows.

Ranking Functions

ROW_NUMBER, RANK, and DENSE_RANK

These three are frequently confused. Here's the definitive breakdown:

SELECT
  customer_id,
  order_date,
  total_amount,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date)     AS row_num,
  RANK()       OVER (PARTITION BY customer_id ORDER BY total_amount DESC) AS amount_rank,
  DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY total_amount DESC) AS amount_dense_rank
FROM orders;

📋

When to use which: Use ROW_NUMBER() when you need a unique sequential number per partition (e.g., deduplication). Use RANK() when ties should get the same rank with gaps after (like Olympic medals). Use DENSE_RANK() when ties get the same rank without gaps.

Real Use Case: Deduplication with ROW_NUMBER

One of the most common uses of ROW_NUMBER() is deduplicating records where you want to keep the most recent version of each entity:

-- Keep the latest record per customer_id
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY customer_id
      ORDER BY updated_at DESC
    ) AS rn
  FROM customers_raw
)
SELECT * EXCLUDE rn
FROM ranked
WHERE rn = 1;

Offset Functions: LAG and LEAD

LAG and LEAD let you access values from other rows relative to the current row, without a self-join. They're essential for period-over-period comparisons.

-- Month-over-month revenue change
SELECT
  month,
  revenue,
  LAG(revenue) OVER (ORDER BY month)  AS prev_month_revenue,
  revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change,
  ROUND(
    (revenue - LAG(revenue) OVER (ORDER BY month))
    / NULLIF(LAG(revenue) OVER (ORDER BY month), 0) * 100,
    2
  ) AS mom_pct_change
FROM monthly_revenue
ORDER BY month;

Real Use Case: Customer Churn Signals

-- Identify customers whose order gap is growing
SELECT
  customer_id,
  order_date,
  LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_date,
  DATEDIFF(order_date, LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date)) AS days_since_last_order
FROM orders
QUALIFY days_since_last_order > 90  -- BigQuery/Snowflake QUALIFY clause
ORDER BY days_since_last_order DESC;

Aggregate Window Functions: Running Totals & Moving Averages

Running totals

SELECT
  order_date,
  daily_revenue,
  SUM(daily_revenue) OVER (ORDER BY order_date) AS cumulative_revenue,
  SUM(daily_revenue) OVER (
    ORDER BY order_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) AS rolling_7d_revenue
FROM daily_revenue_summary;

Moving Averages

-- 30-day moving average (handling variable window at start of series)
SELECT
  date,
  daily_active_users,
  AVG(daily_active_users) OVER (
    ORDER BY date
    ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
  ) AS rolling_30d_avg_dau
FROM daily_metrics
ORDER BY date;

NTILE: Bucketing Rows into Percentiles

NTILE(n) divides rows into n roughly equal buckets. It's perfect for customer segmentation and RFM analysis:

-- RFM Customer Segmentation
WITH rfm_scores AS (
  SELECT
    customer_id,
    MAX(order_date) AS last_order_date,
    COUNT(order_id) AS frequency,
    SUM(total_amount) AS monetary,
    NTILE(5) OVER (ORDER BY MAX(order_date) DESC)   AS recency_score,
    NTILE(5) OVER (ORDER BY COUNT(order_id) DESC)   AS frequency_score,
    NTILE(5) OVER (ORDER BY SUM(total_amount) DESC) AS monetary_score
  FROM orders
  GROUP BY customer_id
)
SELECT
  customer_id,
  recency_score,
  frequency_score,
  monetary_score,
  (recency_score + frequency_score + monetary_score) AS rfm_total,
  CASE
    WHEN recency_score >= 4 AND frequency_score >= 4 THEN 'Champions'
    WHEN recency_score >= 3 AND monetary_score >= 4  THEN 'High Value At Risk'
    WHEN recency_score <= 2 AND frequency_score >= 3 THEN 'Churned Loyalists'
    ELSE 'Standard'
  END AS customer_segment
FROM rfm_scores;

First/Last Value and Cohort Analysis

-- Cohort analysis: revenue per cohort month
WITH first_orders AS (
  SELECT
    customer_id,
    DATE_TRUNC('month', MIN(order_date)) AS cohort_month
  FROM orders
  GROUP BY customer_id
),
cohort_data AS (
  SELECT
    f.cohort_month,
    DATE_TRUNC('month', o.order_date) AS order_month,
    COUNT(DISTINCT o.customer_id) AS customers,
    SUM(o.total_amount) AS revenue
  FROM orders o
  JOIN first_orders f USING (customer_id)
  GROUP BY 1, 2
)
SELECT
  cohort_month,
  order_month,
  DATEDIFF('month', cohort_month, order_month) AS month_number,
  customers,
  FIRST_VALUE(customers) OVER (
    PARTITION BY cohort_month
    ORDER BY order_month
  ) AS cohort_size,
  ROUND(customers::float / FIRST_VALUE(customers) OVER (
    PARTITION BY cohort_month ORDER BY order_month
  ) * 100, 1) AS retention_rate
FROM cohort_data
ORDER BY cohort_month, order_month;

Performance Tips

Partition wisely — large partition keys spill to disk. Make sure partition columns are indexed or clustered.
Reuse window specs — in BigQuery and Snowflake you can name window specs with the WINDOW clause to avoid repetition.
Combine into one pass — multiple window functions with the same OVER clause are computed in a single scan.
Use QUALIFY in Snowflake/BigQuery — filter on window function results without a subquery.

Wrapping Up

Window functions eliminate entire categories of painful SQL workarounds — self-joins, subqueries, and procedural code. Once you internalize the OVER clause, you'll find yourself reaching for these functions daily. The patterns above cover 90% of real-world analyst use cases. Start with the ones that solve your most immediate problems, and the rest will follow naturally.