It will support the integrity of all the given data, keeping the SQL Server database free from duplicates.
The concept of CTE utilizing ROW_NUMBER() helps manage the order number for the number of records, so it is perfect for finding and effectively deleting duplicate records in SQL Server.
In this SQL Server tutorial, you will ensure your data set is clean, and your analyses are correct.
Delete Duplicate Records in SQL Server
Removing duplicate records is one of the most common administrative tasks for any database administrator or SQL Server developer.
Duplicate records in a table can occur for various reasons, such as user errors, system integration problems, data migration problems, etc. It is essential to clean up those duplicate records to ensure that no misleading analytics and reports are conducted.
Detecting Duplicates in a Table
First, you must recognize duplicate records to delete them. In one standard methodology, the window function ROW_NUMBER achieves this result.
It allows the numbering of records in the result set partition so that we can detect duplicates in the form of a unique sequence of integers.
You can use the query below to identify duplicate records.
WITH DuplicateRecords AS (
SELECT
[column1],
[column2],
[columnN],
ROW_NUMBER() OVER (PARTITION BY [column1], [column2], [columnN] ORDER BY [unique_column]) AS row_num
FROM
[table_name]
)
SELECT *
FROM DuplicateRecords
WHERE row_num > 1;
For example, let’s create a table CustomerOrders using the query below.
CREATE TABLE CustomerOrders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
ProductID INT,
Quantity INT,
Price DECIMAL(10, 2)
);
Insert some records with duplicate values.
INSERT INTO CustomerOrders (OrderID, CustomerID, OrderDate, ProductID, Quantity, Price) VALUES
(1, 101, '2023-05-01', 201, 2, 19.99),
(2, 102, '2023-05-01', 202, 1, 39.99),
(3, 101, '2023-05-01', 201, 2, 19.99),
(4, 103, '2023-05-02', 203, 1, 29.99),
(5, 101, '2023-05-01', 201, 2, 19.99),
(6, 104, '2023-05-03', 204, 3, 15.99);
If you look at the records with OrderID 1, 3, and 5 are duplicates.
First, using the query below, you must identify the duplicate records using the ROW_NUMBER() function.
WITH DuplicateOrders AS (
SELECT
OrderID,
CustomerID,
OrderDate,
ProductID,
Quantity,
Price,
ROW_NUMBER() OVER (PARTITION BY CustomerID, OrderDate, ProductID, Quantity, Price ORDER BY OrderID) AS row_num
FROM
CustomerOrders
)
SELECT *
FROM DuplicateOrders
WHERE row_num > 1;
Once found, you can quickly delete them by keeping an original record and deleting the duplicates.
WITH DuplicateOrders AS (
SELECT
OrderID,
ROW_NUMBER() OVER (PARTITION BY CustomerID, OrderDate, ProductID, Quantity, Price ORDER BY OrderID) AS row_num
FROM
CustomerOrders
)
DELETE FROM DuplicateOrders
WHERE row_num > 1;
SELECT * FROM CustomerOrders;
From the output, you can see that the CustomerOrders table doesn’t contain duplicate records.
Conclusion
This SQL Server tutorial used the CustomerOrders table as an example to give you practical knowledge on handling duplicates. You will now take almost identical skills and work them into your databases, ensuring your data integrity is unchanged.
You may like to read:
- How to Delete Stored Procedures in SQL Server
- How to Drop Table If Exists in SQL Server
- SQL Server Drop Column with Constraint
After working for more than 15 years in the Software field, especially in Microsoft technologies, I have decided to share my expert knowledge of SQL Server. Check out all the SQL Server and related database tutorials I have shared here. Most of the readers are from countries like the United States of America, the United Kingdom, New Zealand, Australia, Canada, etc. I am also a Microsoft MVP. Check out more here.