SQL Query to Find Top N Records Per Group

SQL Query to Find Top N Records Per Group

You have a sales table with thousands of transactions across dozens of regions. Your manager asks for the top three salespeople in each region by total revenue. Simple enough request. But when you try to write the SQL, you quickly realise that a basic ORDER BY and LIMIT gives you the top three records across the entire table, not the top three within each region separately.

This is one of the most common and most practically useful SQL problems in data analysis. Finding the top N records across a whole table is easy. Finding the top N records within each group independently is where most beginners get stuck, and it comes up constantly in real analytical work. Top five products per category. Top three customers per region. Highest scoring student per class. Most recent order per customer. All of these are the same problem with different numbers and different tables.

The solution uses SQL window functions, specifically ROW_NUMBER, RANK, and DENSE_RANK combined with PARTITION BY, to assign a rank number to every row within each group and then filter to only the rows where that rank is within the range you want. Once you understand how this works, you can apply it to any dataset and any value of N without changing the structure of the query at all.

This guide walks through the full solution step by step with real examples showing when to use each ranking function and what the differences mean for your results.

What Does Top N Records Per Group Mean?

Finding the top N records per group means selecting a fixed number of rows from each distinct group in your data, where the selection is based on ranking each row within its group according to some ordering criterion like revenue, score, date, or count.

Think of it like an awards ceremony where you want to announce the top three finishers in each age category of a race, not the top three finishers overall. The top three overall might all come from the same age group. You want the top three within each group independently so every category gets its own podium. SQL window functions with PARTITION BY are how you divide the data into those independent groups and rank within each one separately.

Setting Up the Example Tables

Two tables are used throughout this guide. The first is a sales transactions table and the second is a student scores table, both common real-world scenarios where top N per group queries are essential.

sql

CREATE TABLE sales (
    sale_id      INT,
    region       VARCHAR(50),
    salesperson  VARCHAR(50),
    product      VARCHAR(50),
    revenue      DECIMAL(10,2),
    sale_date    DATE
);

INSERT INTO sales VALUES
(1,  'Northeast', 'Sarah Mills',  'Laptop',   4200.00, '2024-01-05'),
(2,  'Northeast', 'James Park',   'Phone',    1850.00, '2024-01-06'),
(3,  'Northeast', 'Anna Chen',    'Tablet',   3100.00, '2024-01-07'),
(4,  'Northeast', 'Sarah Mills',  'Monitor',  2200.00, '2024-01-08'),
(5,  'Northeast', 'David Lee',    'Laptop',   5600.00, '2024-01-09'),
(6,  'Southeast', 'Maria Lopez',  'Phone',    2900.00, '2024-01-10'),
(7,  'Southeast', 'Tom Harris',   'Laptop',   4100.00, '2024-01-11'),
(8,  'Southeast', 'Maria Lopez',  'Tablet',   1750.00, '2024-01-12'),
(9,  'Southeast', 'Kevin Brown',  'Monitor',  3300.00, '2024-01-13'),
(10, 'Southeast', 'Tom Harris',   'Phone',    2100.00, '2024-01-14'),
(11, 'Midwest',   'Lisa Wang',    'Laptop',   3800.00, '2024-01-15'),
(12, 'Midwest',   'Chris Evans',  'Phone',    1600.00, '2024-01-16'),
(13, 'Midwest',   'Lisa Wang',    'Tablet',   2700.00, '2024-01-17'),
(14, 'Midwest',   'Nina Patel',   'Monitor',  4500.00, '2024-01-18'),
(15, 'Midwest',   'Chris Evans',  'Laptop',   3200.00, '2024-01-19');

CREATE TABLE student_scores (
    student_id   INT,
    student_name VARCHAR(50),
    subject      VARCHAR(50),
    score        INT,
    exam_date    DATE
);

INSERT INTO student_scores VALUES
(1, 'Alice',   'Math',    95, '2024-03-01'),
(2, 'Bob',     'Math',    88, '2024-03-01'),
(3, 'Carol',   'Math',    92, '2024-03-01'),
(4, 'David',   'Math',    88, '2024-03-01'),
(5, 'Eve',     'Math',    76, '2024-03-01'),
(6, 'Alice',   'Science', 89, '2024-03-02'),
(7, 'Bob',     'Science', 94, '2024-03-02'),
(8, 'Carol',   'Science', 91, '2024-03-02'),
(9, 'David',   'Science', 79, '2024-03-02'),
(10,'Eve',     'Science', 94, '2024-03-02'),
(11,'Alice',   'English', 82, '2024-03-03'),
(12,'Bob',     'English', 90, '2024-03-03'),
(13,'Carol',   'English', 87, '2024-03-03'),
(14,'David',   'English', 95, '2024-03-03'),
(15,'Eve',     'English', 83, '2024-03-03');

Notice the student_scores table deliberately has tied scores in Math where Bob and David both scored 88, and in Science where Bob and Eve both scored 94. These ties are what reveal the important differences between ROW_NUMBER, RANK, and DENSE_RANK.

Step by Step: Finding Top N Records Per Group

Step 1: Understand the Three Ranking Functions

Before writing the query, understanding what each ranking function does with tied values is essential because that difference determines which one belongs in your query.

ROW_NUMBER assigns a unique sequential number to every row within each partition. When two rows have equal values in the ORDER BY column, ROW_NUMBER assigns them different numbers arbitrarily. There are never any ties in the output and there are never any gaps in the numbering. Every partition gets ranks 1, 2, 3, 4 continuously.

RANK assigns the same rank to tied rows and then skips the next rank numbers to account for the tie. If two rows tie for second place they both get rank 2 and the next row gets rank 4. The output has ties but also has gaps in the numbering sequence.

DENSE_RANK assigns the same rank to tied rows but does not skip any rank numbers afterward. If two rows tie for second place they both get rank 2 and the next row gets rank 3. The output has ties but no gaps in the numbering sequence.

Step 2: Assign Row Numbers Within Each Group

The foundational query assigns a row number to every row within each region, ordered by revenue descending:

sql

SELECT
    region,
    salesperson,
    product,
    revenue,
    ROW_NUMBER() OVER (
        PARTITION BY region
        ORDER BY revenue DESC
    ) AS row_num
FROM sales;

Result:

region      salesperson   product    revenue    row_num
Midwest     Nina Patel    Monitor    4500.00    1
Midwest     Lisa Wang     Laptop     3800.00    2
Midwest     Lisa Wang     Tablet     2700.00    3
Midwest     Chris Evans   Laptop     3200.00    4
Midwest     Chris Evans   Phone      1600.00    5
Northeast   David Lee     Laptop     5600.00    1
Northeast   Sarah Mills   Laptop     4200.00    2
Northeast   Anna Chen     Tablet     3100.00    3
Northeast   Sarah Mills   Monitor    2200.00    4
Northeast   James Park    Phone      1850.00    5
Southeast   Tom Harris    Laptop     4100.00    1
Southeast   Kevin Brown   Monitor    3300.00    2
Southeast   Tom Harris    Phone      2100.00    3
Southeast   Maria Lopez   Phone      2900.00    4
Southeast   Maria Lopez   Tablet     1750.00    5

PARTITION BY region resets the row number counter at the start of each region. ORDER BY revenue DESC means row 1 in each partition is the highest revenue transaction. The row numbering runs independently within each region, so every region has its own rows numbered 1 through however many rows it contains.

Step 3: Filter to the Top N Rows Per Group

Wrap the ranked query in a subquery or CTE and filter on the row number. This is the complete top N per group query:

sql

SELECT
    region,
    salesperson,
    product,
    revenue
FROM (
    SELECT
        region,
        salesperson,
        product,
        revenue,
        ROW_NUMBER() OVER (
            PARTITION BY region
            ORDER BY revenue DESC
        ) AS row_num
    FROM sales
) ranked
WHERE row_num <= 3
ORDER BY region, row_num;

Result:

region      salesperson    product    revenue
Midwest     Nina Patel     Monitor    4500.00
Midwest     Lisa Wang      Laptop     3800.00
Midwest     Chris Evans    Laptop     3200.00
Northeast   David Lee      Laptop     5600.00
Northeast   Sarah Mills    Laptop     4200.00
Northeast   Anna Chen      Tablet     3100.00
Southeast   Tom Harris     Laptop     4100.00
Southeast   Kevin Brown    Monitor    3300.00
Southeast   Maria Lopez    Phone      2900.00

Exactly three rows per region, the three highest revenue transactions in each one independently. Change WHERE row_num <= 3 to WHERE row_num <= 5 and you get five rows per region. Change 3 to 1 and you get only the single highest revenue transaction per region. The structure of the query never changes, only the number.

Step 4: Use a CTE for Cleaner Syntax

A Common Table Expression makes the same query easier to read and easier to modify, especially when the ranking logic becomes more complex:

sql

WITH ranked_sales AS (
    SELECT
        region,
        salesperson,
        product,
        revenue,
        ROW_NUMBER() OVER (
            PARTITION BY region
            ORDER BY revenue DESC
        ) AS row_num
    FROM sales
)
SELECT
    region,
    salesperson,
    product,
    revenue
FROM ranked_sales
WHERE row_num <= 3
ORDER BY region, revenue DESC;

The WITH clause defines the ranked_sales CTE. The main SELECT reads from it as if it were a regular table. CTEs are especially useful when you need to reference the ranked result multiple times in the same query or when the ranking logic involves pre-aggregation before ranking.

Step 5: Use RANK to Handle Ties Inclusively

ROW_NUMBER assigns arbitrary rankings to tied values. If you want tied rows to both appear in the top N results, use RANK instead. This example finds the top 3 students per subject including all ties:

sql

WITH ranked_scores AS (
    SELECT
        subject,
        student_name,
        score,
        RANK() OVER (
            PARTITION BY subject
            ORDER BY score DESC
        ) AS score_rank
    FROM student_scores
)
SELECT
    subject,
    student_name,
    score,
    score_rank
FROM ranked_scores
WHERE score_rank <= 3
ORDER BY subject, score_rank;

Result:

subject   student_name   score   score_rank
English   David          95      1
English   Bob            90      2
English   Carol          87      3
Math      Alice          95      1
Math      Carol          92      2
Math      Bob            88      3
Math      David          88      3
Science   Bob            94      1
Science   Eve            94      1
Science   Carol          91      3

In Math, Bob and David both scored 88 and both receive rank 3. In Science, Bob and Eve both scored 94 and both receive rank 1. WHERE score_rank <= 3 includes all of them because the filter is based on rank value and both tied rows have a rank value that satisfies the condition. RANK gives you more than N rows per group when ties exist at the boundary.

Step 6: Use DENSE_RANK for Ties Without Gaps

DENSE_RANK behaves like RANK for ties but eliminates the gaps in the numbering sequence. This matters when you want exactly N distinct rank positions regardless of how many rows share each position:

sql

WITH dense_ranked AS (
    SELECT
        subject,
        student_name,
        score,
        DENSE_RANK() OVER (
            PARTITION BY subject
            ORDER BY score DESC
        ) AS dense_rank_num
    FROM student_scores
)
SELECT
    subject,
    student_name,
    score,
    dense_rank_num
FROM dense_ranked
WHERE dense_rank_num <= 3
ORDER BY subject, dense_rank_num;

The result is identical to the RANK example for this dataset because the ties happen at rank positions where RANK and DENSE_RANK produce the same outcome. The difference becomes visible when ties occur at a rank that causes RANK to skip numbers. For example if two rows tie for rank 2, RANK skips rank 3 and the next row gets rank 4. A filter of WHERE rank_num <= 3 would miss that next row. DENSE_RANK never skips so WHERE dense_rank_num <= 3 always captures exactly the top 3 distinct rank positions.

Step 7: Top N Per Group on Aggregated Data

A very common real-world version of this problem requires aggregating first, then ranking. Finding the top three salespeople per region by total revenue across all their transactions requires summing revenue before ranking:

sql

WITH salesperson_totals AS (
    SELECT
        region,
        salesperson,
        SUM(revenue)   AS total_revenue,
        COUNT(*)       AS transaction_count
    FROM sales
    GROUP BY region, salesperson
),
ranked_totals AS (
    SELECT
        region,
        salesperson,
        total_revenue,
        transaction_count,
        ROW_NUMBER() OVER (
            PARTITION BY region
            ORDER BY total_revenue DESC
        ) AS revenue_rank
    FROM salesperson_totals
)
SELECT
    region,
    salesperson,
    total_revenue,
    transaction_count,
    revenue_rank
FROM ranked_totals
WHERE revenue_rank <= 2
ORDER BY region, revenue_rank;

Two CTEs chained together. The first aggregates transactions to salesperson totals within each region. The second applies ROW_NUMBER over those aggregated totals. The final SELECT filters to the top 2 per region. This two-step CTE pattern, aggregate first then rank, is the standard approach whenever you need to rank based on a calculated metric rather than a raw column value.

ROW_NUMBER vs RANK vs DENSE_RANK Comparison

ScoresROW_NUMBERRANKDENSE_RANK
95111
92222
88333
88433
76554

ROW_NUMBER: unique numbers, no ties, no gaps. Use when you need exactly N rows per group with no exceptions. RANK: ties allowed, gaps appear after ties. Use when tied values should both qualify for the top N positions. DENSE_RANK: ties allowed, no gaps after ties. Use when you want N distinct rank positions and ties at any position should not cause you to miss lower positions.

Common Limitations

You cannot use window functions directly in a WHERE clause. WHERE row_num <= 3 only works when the window function result is in a subquery or CTE. Writing WHERE ROW_NUMBER() OVER (…) <= 3 directly in the main query produces a syntax error in all major databases. The subquery or CTE wrapper is not optional, it is a requirement of how SQL processes window functions.

ROW_NUMBER breaks ties arbitrarily. When two rows have identical values in the ORDER BY column, ROW_NUMBER assigns them different numbers with no guarantee about which one gets the lower number. If you need deterministic tie-breaking, add a secondary column to the ORDER BY inside the OVER clause such as ORDER BY revenue DESC, sale_id ASC so the tie-breaking rule is explicit and consistent across query executions.

Performance depends on partition size and indexing. The window function scans the rows within each partition to assign rankings. For very large tables with many groups, adding an index on the PARTITION BY column significantly improves query performance. Without an index the database performs a full table scan for each partition which becomes slow on tables with millions of rows.

Common Mistakes to Avoid

Using LIMIT instead of a window function for top N per group. LIMIT gives you top N across the entire table, not per group. SELECT region, salesperson FROM sales ORDER BY revenue DESC LIMIT 3 returns the three highest revenue rows globally. It does not give you three rows per region. Window functions with PARTITION BY are the only way to get top N within each group independently.

Filtering on the window function in the wrong place. Putting WHERE row_num <= 3 in the inner query before the window function is calculated produces an error or wrong results depending on the database. The window function must be calculated first in the inner query or CTE, then the filter is applied in the outer query. Always filter on the ranking column in the outer SELECT, never in the same SELECT where the window function is defined.

Choosing ROW_NUMBER when ties matter for your business question. If you are finding the top three scoring students per subject and two students are genuinely tied for third place, using ROW_NUMBER silently excludes one of them based on an arbitrary tiebreaker. For ranking problems where ties have real meaning, RANK or DENSE_RANK is the honest choice.

Forgetting to include the partition column in the final SELECT. If your query partitions by region but the final SELECT does not include region, your results show the top N rows from the entire table with no indication of which group each row belongs to. Always include the PARTITION BY column in the output of a top N per group query.

Top N Per Group SQL Cheat Sheet

TaskSQL Pattern
Top N per group (no ties)ROW_NUMBER() OVER (PARTITION BY grp ORDER BY val DESC)
Top N per group (include ties)RANK() OVER (PARTITION BY grp ORDER BY val DESC)
Top N distinct positions with tiesDENSE_RANK() OVER (PARTITION BY grp ORDER BY val DESC)
Filter to top 3WHERE row_num <= 3
Top 1 per group (latest record)ROW_NUMBER() OVER (PARTITION BY id ORDER BY date DESC) then WHERE row_num = 1
Rank aggregated valuesAggregate in CTE first, then apply ROW_NUMBER in second CTE
Deterministic tie-breakingORDER BY val DESC, id ASC inside OVER clause
Bottom N per groupORDER BY val ASC inside OVER clause

Finding the top N records per group is one of those SQL patterns that appears constantly in real analytical work and trips up almost every beginner the first time they try to write it. The instinct to reach for ORDER BY and LIMIT is natural and correct for the simpler version of the problem. The window function approach with PARTITION BY is what makes the per group version work.

The structure is always the same. Assign a rank number within each group using ROW_NUMBER, RANK, or DENSE_RANK with PARTITION BY. Wrap that in a subquery or CTE. Filter the outer query on the rank column. The only decisions are which ranking function matches your tie-handling requirement and what number N you need for the specific question.

Start with ROW_NUMBER for most cases where you need exactly N rows and ties do not matter. Switch to RANK when tied values genuinely both deserve to appear in the top N. Use DENSE_RANK when you want N distinct ranking positions without gaps causing you to miss rows at the boundary.

Once this pattern is comfortable, it becomes one of the most reusable tools in your SQL toolkit. The top selling product per category, the most recent login per user, the highest rated review per restaurant. Every one of those queries uses the same structure with different column names.

FAQs

How do I find the top N records per group in SQL?

Use a window function to assign row numbers within each group, then filter to rows where the row number is within your desired range. Wrap the ranking in a subquery or CTE: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY group_col ORDER BY value_col DESC) AS rn FROM table) SELECT * FROM ranked WHERE rn <= N.

What is the difference between ROW_NUMBER, RANK, and DENSE_RANK in SQL?

ROW_NUMBER assigns unique sequential numbers with no ties and no gaps. RANK assigns the same number to tied rows and skips subsequent numbers to account for the tie. DENSE_RANK assigns the same number to tied rows but does not skip any subsequent numbers. Use ROW_NUMBER for exactly N rows per group, RANK or DENSE_RANK when tied values should both qualify.

Why can I not use a window function in a WHERE clause directly?

SQL processes WHERE clauses before window functions are calculated, so the window function result does not exist yet at the point WHERE runs. You must calculate the window function in a subquery or CTE first, then apply the filter in the outer query where the ranking column is already available as a named column.

How do I find the top N records per group based on a SUM or COUNT?

Aggregate in a first CTE using GROUP BY to calculate the sum or count per group, then apply the window function ranking in a second CTE over those aggregated results, then filter in the final SELECT. This two-step CTE pattern, aggregate then rank, handles any case where the ranking criterion is a calculated metric rather than a raw column value.

Does the top N per group pattern work in MySQL?

Yes, but only in MySQL 8.0 and above which introduced window functions. MySQL versions below 8.0 do not support ROW_NUMBER, RANK, or DENSE_RANK and require a more complex workaround using correlated subqueries or user-defined variables. If you are on MySQL 5.7 or earlier, upgrading to MySQL 8.0 is the most practical solution for window function support.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top