-1

This is a "toy" example of a table that has many columns and 100s of thousands of rows.

I want FILTER OUT any rows containing the same AcctNo, CustomerName and CustomerContact, but KEEP the ID for ONE of the duplicates (so i can access the record later).

  • Example:

    ID AcctNo CustomerName CustomerContact 1 1111 Acme Foods John Smith 2 1111 Acme Foods John Smith 3 1111 Acme Foods Judy Lawson 4 2222 YoyoDyne Inc Thomas Pynchon 5 2222 YoyoDyne Inc Thomas Pynchon <= I want to save IDs 2, 3, and 5 
  • Fiddle: https://www.db-fiddle.com/f/bEECHi6XnvKAeXC4Xthrrr/1

Q: What SQL do I need to accomplish this?

4
  • 1
    And how is ID 3 a duplicate? Commented Apr 7, 2021 at 22:43
  • You might consider using the row_number() function. Commented Apr 7, 2021 at 22:46
  • 1
    Please share what sql you already tried. Commented Apr 7, 2021 at 22:47
  • 1
    You need the maximum Id for each group... Commented Apr 7, 2021 at 22:51

2 Answers 2

2
select MAX(ID) as KeepID,AcctNo,CustomerName,CustomerContact from test GROUP BY AcctNo,CustomerName,CustomerContact 
Sign up to request clarification or add additional context in comments.

Comments

1

So basically what you want is, partition your table by AcctNo, CustomerName and CustomerContact. It's unclear in the question how you want select which ID you need to keep, but for that you need to modify the the following query. But this should give you a starting point.

SELECT * FROM test JOIN (SELECT id, Row_number() OVER ( partition BY acctno, customername, customercontact) rn FROM test) A ON test.id = A.id WHERE A.rn = 1 

This should return something like this:

ID AcctNo CustomerName CustomerContact id rn
1 11111 Acme Foods John Smith 1 1
3 11111 Acme Foods Judy Lawson 3 1
4 22222 Yoyodyne Inc. Thomas Pynchon 4 1

What this is doing is basically first calculating row num based on the partition criteria and then picking only one row per partition.

4 Comments

Please don't use images for data... use formatted/tabular text.
@Pradatta - 1) In answer to your comment above, I tried many things ... but I didn't want to "clutter" my question with a bunch of failed attempts. 2) Regarding your reply: I considered partitions and row_number(), but I would have preferred a simpler solution. Like Robert Sheahan's. 3) Do you know which approach would be more "efficient" (for large-ish datasets)?
@FoggyDay The MAX with Group By is more efficient I think. Here's a nice explanation why: stackoverflow.com/questions/11233125/…
Excellent citation - Thank you.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.