How to Delete Duplicate Records in SQL Server

It will support the integrity of all the given data, keeping the SQL Server database free from duplicates.

The concept of CTE utilizing ROW_NUMBER() helps manage the order number for the number of records, so it is perfect for finding and effectively deleting duplicate records in SQL Server.

In this SQL Server tutorial, you will ensure your data set is clean, and your analyses are correct.

Table of Contents

Delete Duplicate Records in SQL Server

Removing duplicate records is one of the most common administrative tasks for any database administrator or SQL Server developer.

Duplicate records in a table can occur for various reasons, such as user errors, system integration problems, data migration problems, etc. It is essential to clean up those duplicate records to ensure that no misleading analytics and reports are conducted.

Detecting Duplicates in a Table

First, you must recognize duplicate records to delete them. In one standard methodology, the window function ROW_NUMBER achieves this result.

It allows the numbering of records in the result set partition so that we can detect duplicates in the form of a unique sequence of integers.

You can use the query below to identify duplicate records.

WITH DuplicateRecords AS (
    SELECT 
        [column1],
        [column2],
        [columnN],
        ROW_NUMBER() OVER (PARTITION BY [column1], [column2], [columnN] ORDER BY [unique_column]) AS row_num
    FROM 
        [table_name]
)
SELECT * 
FROM DuplicateRecords
WHERE row_num > 1;

For example, let’s create a table CustomerOrders using the query below.

CREATE TABLE CustomerOrders (
    OrderID INT PRIMARY KEY,
    CustomerID INT,
    OrderDate DATE,
    ProductID INT,
    Quantity INT,
    Price DECIMAL(10, 2)
);

Insert some records with duplicate values.

INSERT INTO CustomerOrders (OrderID, CustomerID, OrderDate, ProductID, Quantity, Price) VALUES
(1, 101, '2023-05-01', 201, 2, 19.99),
(2, 102, '2023-05-01', 202, 1, 39.99),
(3, 101, '2023-05-01', 201, 2, 19.99),
(4, 103, '2023-05-02', 203, 1, 29.99),
(5, 101, '2023-05-01', 201, 2, 19.99),
(6, 104, '2023-05-03', 204, 3, 15.99);

If you look at the records with OrderID 1, 3, and 5 are duplicates.

First, using the query below, you must identify the duplicate records using the ROW_NUMBER() function.

WITH DuplicateOrders AS (
    SELECT 
        OrderID,
        CustomerID,
        OrderDate,
        ProductID,
        Quantity,
        Price,
        ROW_NUMBER() OVER (PARTITION BY CustomerID, OrderDate, ProductID, Quantity, Price ORDER BY OrderID) AS row_num
    FROM 
        CustomerOrders
)
SELECT * 
FROM DuplicateOrders
WHERE row_num > 1;

Detecting Duplicates in a SQL Server Table

Once found, you can quickly delete them by keeping an original record and deleting the duplicates.

WITH DuplicateOrders AS (
    SELECT 
        OrderID,
        ROW_NUMBER() OVER (PARTITION BY CustomerID, OrderDate, ProductID, Quantity, Price ORDER BY OrderID) AS row_num
    FROM 
        CustomerOrders
)
DELETE FROM DuplicateOrders
WHERE row_num > 1;

SELECT * FROM CustomerOrders;

From the output, you can see that the CustomerOrders table doesn’t contain duplicate records.

Conclusion

This SQL Server tutorial used the CustomerOrders table as an example to give you practical knowledge on handling duplicates. You will now take almost identical skills and work them into your databases, ensuring your data integrity is unchanged.