SQL Window Functions and PARTITION BY in Practice

SQL Window Functions and PARTITION BY in Practice

Cindy Manel
• 6 min read

A few months ago, our team faced a deceptively simple client request: clean up historical data in their content management system. They had a table tracking programs assigned to different page slots, locales, and devices. Each assignment created a new record with an updated_at timestamp, building up historical versions over time.

The business logic was straightforward - only the latest program for each unique combination should remain active, with older versions treated as historical records to be purged. We could have built a JavaScript solution to fetch records, group them, and delete the older ones. But at Marmelab, we’ve learned to explore what SQL offers before writing application-level workarounds. We decided to use window functions and PARTITION BY - a SQL feature that provided an elegant approach to identify which records needed processing.

This article introduces window functions to developers who haven’t worked with them before, using our real client scenario to demonstrate their practical value.

Understanding Window Functions and OVER

Window functions are a standard SQL feature available in most relational databases, including PostgreSQL, MySQL, SQLite, Oracle and SQL Server. They let you perform calculations across a set of rows related to the current row, while preserving the detail of each individual row. They use the OVER clause to define which rows the calculation should apply to.

Here’s a simple example using our client’s data structure:

-- Add a sequential number to each row
SELECT program_id, updated_at,
       ROW_NUMBER() OVER (ORDER BY updated_at) as row_number
FROM programs;

Result:

program_id | updated_at        | row_number
101        | 2024-01-01 10:00  | 1
201        | 2024-01-01 16:45  | 2
102        | 2024-01-02 14:30  | 3
202        | 2024-01-02 08:20  | 4
103        | 2024-01-03 09:15  | 5

The OVER (ORDER BY updated_at) tells the database: “For each row, assign a number based on chronological order.” Each row keeps its original data but gains additional computed information about the dataset.

Window functions come in several varieties, each serving different analytical needs:

SELECT program_id, updated_at,
       -- Sequential numbering
       ROW_NUMBER() OVER (ORDER BY updated_at) as row_num,
       -- Ranking (handles ties)
       RANK() OVER (ORDER BY updated_at) as rank,
       -- Running total
       SUM(program_id) OVER (ORDER BY updated_at) as running_total,
       -- Previous row value
       LAG(program_id) OVER (ORDER BY updated_at) as previous_program,
       -- Next row value
       LEAD(program_id) OVER (ORDER BY updated_at) as next_program
FROM programs;

The key insight about the OVER clause is that it defines the “window” of calculation for each row. An empty OVER() means calculate across the entire table, while OVER(ORDER BY) creates cumulative calculations following the specified order.

Grouping Without Losing Detail with PARTITION BY

Our real challenge was more complex than simple chronological ordering: we needed to identify the most recent program for each unique combination of page, slot, locale, and support values.

The solution was to use PARTITION BY, which divides the data into logical groups.

Here’s what the client’s data looked like:

-- Table programs with multiple combinations
program_id | page_id | slot | locale | support | updated_at
101        | home    | top  | fr     | mobile  | 2024-01-01 10:00
102        | home    | top  | fr     | mobile  | 2024-01-02 14:30
103        | home    | top  | fr     | mobile  | 2024-01-03 09:15
201        | home    | hero | fr     | mobile  | 2024-01-01 16:45
202        | home    | hero | fr     | mobile  | 2024-01-02 08:20
301        | about   | top  | en     | desktop | 2024-01-01 12:00

Here’s the query we used:

SELECT
    program_id,
    page_id,
    slot,
    locale,
    support,
    updated_at,
    ROW_NUMBER() OVER (
        PARTITION BY page_id, slot, locale, support
        ORDER BY updated_at DESC
    ) as rank
FROM programs
ORDER BY page_id, slot, locale, support, updated_at DESC;

Result:

program_id | page_id | slot | locale | support | updated_at       | rank
301        | about   | top  | en     | desktop | 2024-01-01 12:00 | 1
202        | home    | hero | fr     | mobile  | 2024-01-02 08:20 | 1
201        | home    | hero | fr     | mobile  | 2024-01-01 16:45 | 2
103        | home    | top  | fr     | mobile  | 2024-01-03 09:15 | 1
102        | home    | top  | fr     | mobile  | 2024-01-02 14:30 | 2
101        | home    | top  | fr     | mobile  | 2024-01-01 10:00 | 3

Think of it as running multiple parallel competitions, each with its own podium. The PARTITION BY clause creates separate ranking systems for each unique combination of page_id, slot, locale, and support.

This solved our problem perfectly because PARTITION BY creates separate ranking systems for each unique combination while preserving all the individual program records we needed to identify for deletion.

Why Not GROUP BY?

A common question when learning about window functions is: Why not just use GROUP BY to find the latest programs? Here’s why that approach has limitations for this type of cleanup task:

-- GROUP BY: One line per combination
SELECT
    page_id,
    slot,
    locale,
    support,
    COUNT(*) as program_count,
    MAX(updated_at) as latest_update
FROM programs
GROUP BY page_id, slot, locale, support;

Result:

page_id | slot | locale | support | program_count| latest_update
home    | top  | fr     | mobile  | 3            | 2024-01-03 09:15
home    | hero | fr     | mobile  | 2            | 2024-01-02 08:20
about   | top  | en     | desktop | 1            | 2024-01-01 12:00

The problem is that we know WHEN the latest update happened, but not WHICH program_id corresponds to that timestamp. GROUP BY aggregates away the individual records we actually needed to target for our operation.

Combining CTEs with Window Functions

Common Table Expressions (CTEs) and window functions solve different problems but work beautifully together. This combination proved valuable while refining our approach to the client’s task.

For our client’s cleanup task, we combined both approaches in the most efficient way:

-- CTE to identify and rank programs
WITH ranked_programs AS (
    SELECT
        program_id,
        page_id,
        slot,
        locale,
        support,
        updated_at,
        ROW_NUMBER() OVER (
            PARTITION BY page_id, slot, locale, support
            ORDER BY updated_at DESC
        ) as rank
    FROM programs
)
-- Delete historical records directly (rank > 1)
DELETE FROM programs
WHERE program_id IN (
    SELECT program_id
    FROM ranked_programs
    WHERE rank > 1
);

This approach proved most effective because a single CTE handles the ranking logic, enabling direct deletion based on rank to eliminate historical records.

Final Thoughts

This client data cleanup reinforced something we’ve experienced at Marmelab: SQL often provides elegant solutions that eliminate the need for complex application code. Window functions proved particularly well-suited for this type of challenge.

PARTITION BY solves a specific class of problems where you need calculations within logical groups while preserving individual record details. Unlike GROUP BY which aggregates and reduces data, the OVER clause creates contextual calculations that enrich your dataset.

For developers encountering similar data challenges, window functions offer a different way of thinking about data relationships. Whether you’re ranking records within categories, calculating running totals across time periods, or identifying duplicates across partitions, these techniques have applications far beyond our specific use case.

For developers new to window functions, start with basic ROW_NUMBER() OVER (ORDER BY) queries to understand the syntax, then gradually explore PARTITION BY for more complex scenarios.

Sometimes the most elegant solution is already waiting in your database.


Have you discovered SQL features that transformed your development approach? I would be glad to hear about it.

Authors

Cindy Manel

Before choosing full-stack development, Cindy was a dentist. No kidding. You can imagine that she's a fast learner.

Comments