SQL Duplicates optimization

Question

I have the following query:

Original query:

SELECT cd1.cust_number_id, cd1.cust_number_id, cd1.First_Name, cd1.Last_Name FROM @Customer_Data cd1 inner join @Customer_Data cd2 on cd1.Cd_Id <> cd2.Cd_Id and cd2.cust_number_id <> cd1.cust_number_id and cd2.First_Name = cd1.First_Name and cd2.Last_Name = cd1.Last_Name inner join @Customer c1 on c1.Cust_id = cd1.cust_number_id inner join @Customer c2 on c2.cust_id = cd2.cust_number_id WHERE c1.cust_number <> c2.cust_number

I optimized it as follows, but there is an error in my optimization and I can't find it:

Optimized query:

 SELECT cd1.cust_number_id, cd1.cust_number_id, cd1.First_Name,cd1.Last_Name FROM ( SELECT cdResult.cust_number_id, cdResult.First_Name,cdResult.Last_Name, COUNT(*) OVER (PARTITION BY cdResult.First_Name, cdResult.Last_Name) as cnt_name_bday FROM @Customer_Data cdResult WHERE cdResult.First_Name IS NOT NULL AND cdResult.Last_Name IS NOT NULL) AS cd1 WHERE cd1.cnt_name_bday > 1;

Test data:

DECLARE @Customer_Data TABLE ( Cd_Id INT, cust_number_id INT, First_Name NVARCHAR(30), Last_Name NVARCHAR(30) ) INSERT @Customer_Data (Cd_Id,cust_number_id,First_Name,Last_Name) VALUES (1, 22, N'Alex', N'Bor'), (2, 22, N'Alex', N'Bor'), (3, 23, N'Alex', N'Bor'), (4, 24, N'Tom', N'Cruse'), (5, 25, N'Tom', N'Cruse') DECLARE @Customer TABLE ( Cust_id INT, Cust_number INT ) INSERT @Customer (Cust_id, Cust_number) VALUES (22, 022), (23, 023), (24, 024), (25, 025)

The problem is that the original query returns 6 rows (duplicating the row). And optimized returns just duplicates, how to make the optimized query also duplicated the row?

Gordon Linoff · Accepted Answer · 2018-09-18 13:46:43Z

4

I would suggest just using window functions:

SELECT CD.cud_customer_id FROM (SELECT cd.*, COUNT(*) OVER (PARTITION BY cud_name, cud_birthday) as cnt_name_bday FROM dbo.customer_data cd ) cd WHERE cnt_name_bday > 1;

Your query is finding duplicates for either name or birthday. You want duplicates with both at the same time.

answered Sep 18, 2018 at 13:46

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Tibomso Over a year ago

Thank you for your help, I have updated the information, could you look it up and give me another tip?

Tibomso Over a year ago

how can I achieve duplicate rows? and did I correctly add the condition of checking records for NULL?

Gordon Linoff Over a year ago

@Tibomso . . . I don't understand your comment. This returns duplicated rows.

Yogesh Sharma · Accepted Answer · 2018-09-18 13:50:34Z

You can use only one exists :

SELECT cd.cud_customer_id FROM dbo.customer_data AS cd WHERE EXISTS (SELECT 1 FROM dbo.customer_data AS c WHERE c.cud_name = cd.cud_name AND c.cud_birthday = cd.cud_birthday AND c.cust_id <> cd.cud_customer_id );

Thank you for your help, I have updated the information, could you look it up and give me another tip?

Collectives™ on Stack Overflow

SQL Duplicates optimization

2 Answers 2

3 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Related