SQL DELETE Duplicate Rows
In this tutorial, we will learn how to delete duplicate rows from a table using SQL's DELETE
statement.
Syntax
The basic syntax of the DELETE
statement is:
DELETE FROM table_name WHERE condition;
Where table_name
is the name of the table from which we want to delete duplicate rows and condition
is the criteria based on which we determine whether a row is a duplicate or not.
Example
Let's take the following employees
table as an example:
id | name | department | salary
----|---------|------------|-------
1 | Alice | HR | 45000
2 | Bob | Marketing | 55000
3 | Charlie | Finance | 65000
4 | Alice | HR | 45000
5 | Bob | Marketing | 55000
We want to delete the duplicate rows based on the combination of name
, department
, and salary
columns. The query to accomplish this will be:
DELETE FROM employees
WHERE id NOT IN (
SELECT MAX(id) FROM employees
GROUP BY name, department, salary
);
Output
After executing the above query, the resulting employees
table will be:
id | name | department | salary
----|---------|------------|-------
1 | Alice | HR | 45000
2 | Bob | Marketing | 55000
3 | Charlie | Finance | 65000
Explanation
The query above uses a subquery to find the maximum id
for each combination of name
, department
, and salary
columns. The records with the maximum id
are then retained and all other duplicate records are deleted.
Use
Deleting duplicate data is essential to ensure accurate reporting and analysis. It can also help to improve the efficiency of database operations, as fewer redundant records take up less storage space and reduce processing time.
Important Points
- Always backup your data before deleting any records.
- Make sure to specify the correct criteria for identifying duplicate records.
- Use caution when deleting records, as it is a non-reversible action.
Summary
In this tutorial, we learned how to delete duplicate rows from a table using SQL's DELETE
statement. We went through the syntax, an example, output, explanation, use-cases, important points, and summarized it all.