SQL: select unique rows

Question

This is a "toy" example of a table that has many columns and 100s of thousands of rows.

I want FILTER OUT any rows containing the same AcctNo, CustomerName and CustomerContact, but KEEP the ID for ONE of the duplicates (so i can access the record later).

Example:

ID AcctNo CustomerName CustomerContact 1 1111 Acme Foods John Smith 2 1111 Acme Foods John Smith 3 1111 Acme Foods Judy Lawson 4 2222 YoyoDyne Inc Thomas Pynchon 5 2222 YoyoDyne Inc Thomas Pynchon <= I want to save IDs 2, 3, and 5

Fiddle: https://www.db-fiddle.com/f/bEECHi6XnvKAeXC4Xthrrr/1

Q: What SQL do I need to accomplish this?

And how is ID 3 a duplicate?

Dale K
– Dale K

2021-04-07 22:43:55 +00:00
Commented Apr 7, 2021 at 22:43 — Dale K
– Dale K, Commented Apr 7, 2021 at 22:43
You might consider using the row_number() function.

Dale K
– Dale K

2021-04-07 22:46:58 +00:00
Commented Apr 7, 2021 at 22:46 — Dale K
– Dale K, Commented Apr 7, 2021 at 22:46
Please share what sql you already tried.

Pradatta
– Pradatta

2021-04-07 22:47:20 +00:00
Commented Apr 7, 2021 at 22:47 — Pradatta
– Pradatta, Commented Apr 7, 2021 at 22:47
You need the maximum Id for each group...

Stu
– Stu

2021-04-07 22:51:54 +00:00
Commented Apr 7, 2021 at 22:51 — Stu
– Stu, Commented Apr 7, 2021 at 22:51

Robert Sheahan · Accepted Answer · 2021-04-07 22:51:50Z

2

select MAX(ID) as KeepID,AcctNo,CustomerName,CustomerContact from test GROUP BY AcctNo,CustomerName,CustomerContact

answered Apr 7, 2021 at 22:51

Robert Sheahan

2,0851 gold badge11 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pradatta · Accepted Answer · 2021-04-08 09:01:14Z

So basically what you want is, partition your table by AcctNo, CustomerName and CustomerContact. It's unclear in the question how you want select which ID you need to keep, but for that you need to modify the the following query. But this should give you a starting point.

SELECT * FROM test JOIN (SELECT id, Row_number() OVER ( partition BY acctno, customername, customercontact) rn FROM test) A ON test.id = A.id WHERE A.rn = 1

This should return something like this:

ID	AcctNo	CustomerName	CustomerContact	id	rn
1	11111	Acme Foods	John Smith	1	1
3	11111	Acme Foods	Judy Lawson	3	1
4	22222	Yoyodyne Inc.	Thomas Pynchon	4	1

What this is doing is basically first calculating row num based on the partition criteria and then picking only one row per partition.

Please don't use images for data... use formatted/tabular text.
@Pradatta - 1) In answer to your comment above, I tried many things ... but I didn't want to "clutter" my question with a bunch of failed attempts. 2) Regarding your reply: I considered partitions and row_number(), but I would have preferred a simpler solution. Like Robert Sheahan's. 3) Do you know which approach would be more "efficient" (for large-ish datasets)?
@FoggyDay The MAX with Group By is more efficient I think. Here's a nice explanation why: stackoverflow.com/questions/11233125/…

Collectives™ on Stack Overflow

SQL: select unique rows

2 Answers 2

Comments

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Linked

Related