How to remove duplicate rows based on array column subset relationship in DolphinDB?

Question

I have a DolphinDB table with an array vector column. I need to remove duplicate rows based on subset relationships within that column.

Sample Input:

sym	prices
a	`[3,4,5,6]`
a	`[3,4,5]`
a	`[2,4,5,6]`
a	`[5,6]`
a	`[7,9]`
a	`[7,9]`

Expected Output:

sym	prices
a	`[3,4,5,6]`
a	`[2,4,5,6]`
a	`[7,9]`

Deduplication Logic:

Subset Removal: If a row's prices array is a subset (i.e., fully contained) of another row's prices array, remove the subset row. In the example, [3,4,5] is a subset of [3,4,5,6], so it is removed; similarly, [5,6] is also a subset of [3,4,5,6] and is removed.
Full Duplicate Removal: If multiple rows have identical prices arrays, keep only one.

What I've Tried:

I considered using group by to remove exact duplicates, but this approach cannot handle subset relationships.

Core Question:
How can I perform this subset-based deduplication?

Thorsten Kettner · Accepted Answer · 2025-11-20 22:14:28Z

Disclaimer: I don't know DolphinDB.

You want to remove real subsets from the table. According to the docs (https://docs.dolphindb.com/en/Programming/Operators/OperatorReferences/lt.html) you can use the less-than operator for this:

delete from mytable subset where exists ( select * from mytable superset where subset.prices < superset.prices );

(If you only want to compare price vectors for the same sym, you must add and subset.sym = superset.sym to the subquery of course.)

You also want to remove duplicate sets and only keep one. For this you'd need <= instead of <, but then you'd also need some ID to tell one row from the other. In some DBMS there is a unique row ID built in. I don't know how it is in dolphin, so maybe you need a custom ID in your table. Then you can extend above statement as follows:

delete from mytable subset where exists ( select * from mytable superset where subset.prices < superset.prices or (subset.prices = superset.prices and subset.id < superset.id) );

Collectives™ on Stack Overflow

How to remove duplicate rows based on array column subset relationship in DolphinDB?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related