SQL Query to Find Duplicate Records (With Examples)

Q: How do I find duplicates in SQL?

Use GROUP BY with HAVING COUNT(*) > 1.

Q: What is the best way to remove duplicates?

Use ROW_NUMBER() and delete rows where the number is greater than 1.

Finding duplicate records is a common task in data analysis. Duplicates can lead to incorrect insights, inaccurate reporting, and poor decision-making.

Using SQL, you can easily identify and handle duplicate data.

In this guide, you’ll learn how to find duplicate records in SQL with practical examples.

What Are Duplicate Records?

Duplicate records are rows that have the same values in one or more columns.

For example:

id	name	email
1	John	john@email.com
2	John	john@email.com

These rows are duplicates based on name and email.

Basic Method: Using GROUP BY and HAVING

The most common way to find duplicates is by using GROUP BY with HAVING.

Example Table: `customers`

id	name	email

Query to Find Duplicates

SELECT name, email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY name, email
HAVING COUNT(*) > 1;

Explanation

GROUP BY groups similar records
COUNT(*) counts occurrences
HAVING COUNT(*) > 1 filters duplicates

This query shows all duplicated values.

Finding Duplicate Rows Based on One Column

Example: Duplicate Emails

SELECT email, COUNT(*) AS duplicate_count
FROM customers
GROUP BY email
HAVING COUNT(*) > 1;

This helps when checking unique fields like email or phone number.

Finding Full Duplicate Rows

If you want to see all duplicate rows (not just grouped values), use a subquery.

Example

SELECT *
FROM customers
WHERE email IN (
    SELECT email
    FROM customers
    GROUP BY email
    HAVING COUNT(*) > 1
);

Use Case

This query returns all rows that are duplicates, not just the grouped summary.

Using ROW_NUMBER() to Identify Duplicates

Window functions provide a more advanced way to detect duplicates.

Example

SELECT *,
       ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS row_num
FROM customers;

Explanation

PARTITION BY email groups similar emails
ROW_NUMBER() assigns a sequence

Rows with row_num > 1 are duplicates.

Removing Duplicate Records

After identifying duplicates, you may want to remove them.

Example (Using CTE)

WITH duplicates AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY email ORDER BY id) AS row_num
    FROM customers
)
DELETE FROM duplicates
WHERE row_num > 1;

Always back up your data before deleting records.

Common Use Cases

Finding duplicates is useful for:

Data cleaning
Preparing datasets for analysis
Ensuring data quality
Avoiding double counting

Common Mistakes to Avoid

1. Forgetting GROUP BY

You must group columns correctly when counting duplicates.

2. Using WHERE Instead of HAVING

WHERE filters rows before grouping
HAVING filters after grouping

3. Ignoring Multiple Columns

Duplicates may depend on more than one column.

Best Practices

Always define what “duplicate” means in your dataset
Use multiple columns when necessary
Validate results before deleting data
Keep a backup

Finding duplicate records is a key skill in SQL and data analysis.

By using techniques like GROUP BY, HAVING, and window functions, you can quickly identify and manage duplicates in your dataset.

Mastering this will help you maintain clean and reliable data for analysis.

FAQs

How do I find duplicates in SQL?

Use GROUP BY with HAVING COUNT(*) > 1.

Can I find duplicates using one column?

Yes, group by that specific column.

What is the best way to remove duplicates?

Use ROW_NUMBER() and delete rows where the number is greater than 1.

Why are duplicates a problem?

They can lead to incorrect analysis and reporting.

What is the difference between WHERE and HAVING?

WHERE filters rows before grouping, HAVING filters after grouping.

SQL Query to Find Duplicate Records (With Examples)

What Are Duplicate Records?

Basic Method: Using GROUP BY and HAVING

Example Table: `customers`

Query to Find Duplicates

Explanation

Finding Duplicate Rows Based on One Column

Example: Duplicate Emails

Finding Full Duplicate Rows

Example

Use Case

Using ROW_NUMBER() to Identify Duplicates

Example

Explanation

Removing Duplicate Records

Example (Using CTE)

Common Use Cases

Common Mistakes to Avoid

1. Forgetting GROUP BY

2. Using WHERE Instead of HAVING

3. Ignoring Multiple Columns

Best Practices

FAQs

How do I find duplicates in SQL?

Can I find duplicates using one column?

What is the best way to remove duplicates?

Why are duplicates a problem?

What is the difference between WHERE and HAVING?

Leave a Comment Cancel Reply

Copyright © 2026 codewithfimi.com - All Rights Reserved

SQL Query to Find Duplicate Records (With Examples)

What Are Duplicate Records?

Basic Method: Using GROUP BY and HAVING

Example Table: customers

Query to Find Duplicates

Explanation

Finding Duplicate Rows Based on One Column

Example: Duplicate Emails

Finding Full Duplicate Rows

Example

Use Case

Using ROW_NUMBER() to Identify Duplicates

Example

Explanation

Removing Duplicate Records

Example (Using CTE)

Common Use Cases

Common Mistakes to Avoid

1. Forgetting GROUP BY

2. Using WHERE Instead of HAVING

3. Ignoring Multiple Columns

Best Practices

FAQs

How do I find duplicates in SQL?

Can I find duplicates using one column?

What is the best way to remove duplicates?

Why are duplicates a problem?

What is the difference between WHERE and HAVING?

Leave a Comment Cancel Reply

Copyright © 2026 codewithfimi.com - All Rights Reserved

Example Table: `customers`