Unique elements from a list according to a subset of fields

Question

Given a record like

data Foo = Foo { fooName :: Text, fooAge :: Int, fooCity :: Text }

With a list of such elements, is there a function to remove duplicates on a subset of fields only, on the model of this hypothetical removeDupBy function?

foos = [ Foo "john" 32 "London", Foo "joe" 18 "New York", Foo "john" 22 "Paris", Foo "john" 32 "Madrid", Foo "joe" 17 "Los Angeles", Foo "joe" 18 "Berlin" ] > removeDupBy (\(Foo f) -> (fooName, fooAge)) foos [ Foo "john" 32 "London", Foo "joe" 18 "New York", Foo "john" 22 "Paris", Foo "joe" 17 "Los Angeles" ]

I could implement my own but would prefer using one from a well-established library, which will probably be much more performant and be much more resilient against edge cases. I was thinking of using nub but I'm not sure how to map the actual Foo elements to the tuples (fooName, fooAge) that nub would filter out.

nubOrdOn from containers.

danidiaz
– danidiaz

2021-05-18 07:53:23 +00:00
Commented May 18, 2021 at 7:53 — danidiaz
– danidiaz, Commented May 18, 2021 at 7:53
Just found about the history of nubOrd / nubOrdOn

Jivan
– Jivan

2021-05-18 08:15:31 +00:00
Commented May 18, 2021 at 8:15 — Jivan
– Jivan, Commented May 18, 2021 at 8:15

castletheperson · Accepted Answer · 2021-05-18 08:40:43Z

Since you are dealing with only strings and numbers, you can use the Ord instance to remove duplicates efficiently, or even Hashable, which allows practically constant-time lookups.

Some functions which exactly match your desired signature are:

nubOrdOn from the containers package

Data.Containers.ListUtils> nubOrdOn (\f -> (fooName f, fooAge f)) foos

hashNubOn from the witherable package

Witherable> hashNubOn (\f -> (fooName f, fooAge f)) foos

You may find other options by searching on Hoogle for (a -> b) -> [a] -> [a]

If you need to do many operations like this, you may prefer to use Map or HashMap directly.

Mark Seemann · Accepted Answer · 2021-05-18 07:53:26Z

You can use nubBy:

Prelude Data.List> nubBy (\x y -> (fooName x, fooAge x) == (fooName y, fooAge y)) foos [Foo {fooName = "john", fooAge = 32, fooCity = "London"}, Foo {fooName = "joe", fooAge = 18, fooCity = "New York"}, Foo {fooName = "john", fooAge = 22, fooCity = "Paris"}, Foo {fooName = "joe", fooAge = 17, fooCity = "Los Angeles"}]

(Output formatted for enhanced readability)

This algorithm is O(n^2), but it could be O(n log n) if the Ord or Hashable instances were used.
@4castle would you use Ord/Hashable with nubBy or would it need using nubOrdOn?
@Jivan It would have to be with a different function, like nubOrdOn, or you could use a Map/HashMap (insert all, and then read out the values)

Collectives™ on Stack Overflow

Unique elements from a list according to a subset of fields

2 Answers 2

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Related