0

I have a DolphinDB table with an array vector column. I need to remove duplicate rows based on subset relationships within that column.

Sample Input:

sym prices
a [3,4,5,6]
a [3,4,5]
a [2,4,5,6]
a [5,6]
a [7,9]
a [7,9]

Expected Output:

sym prices
a [3,4,5,6]
a [2,4,5,6]
a [7,9]

Deduplication Logic:

  1. Subset Removal: If a row's prices array is a subset (i.e., fully contained) of another row's prices array, remove the subset row. In the example, [3,4,5] is a subset of [3,4,5,6], so it is removed; similarly, [5,6] is also a subset of [3,4,5,6] and is removed.

  2. Full Duplicate Removal: If multiple rows have identical prices arrays, keep only one.

What I've Tried:

I considered using group by to remove exact duplicates, but this approach cannot handle subset relationships.

Core Question:
How can I perform this subset-based deduplication?

New contributor
xinyu zhang is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
0

1 Answer 1

0

Disclaimer: I don't know DolphinDB.

You want to remove real subsets from the table. According to the docs (https://docs.dolphindb.com/en/Programming/Operators/OperatorReferences/lt.html) you can use the less-than operator for this:

delete from mytable subset where exists ( select * from mytable superset where subset.prices < superset.prices ); 

(If you only want to compare price vectors for the same sym, you must add and subset.sym = superset.sym to the subquery of course.)

You also want to remove duplicate sets and only keep one. For this you'd need <= instead of <, but then you'd also need some ID to tell one row from the other. In some DBMS there is a unique row ID built in. I don't know how it is in dolphin, so maybe you need a custom ID in your table. Then you can extend above statement as follows:

delete from mytable subset where exists ( select * from mytable superset where subset.prices < superset.prices or (subset.prices = superset.prices and subset.id < superset.id) ); 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.