2

Given a record like

data Foo = Foo { fooName :: Text, fooAge :: Int, fooCity :: Text } 

With a list of such elements, is there a function to remove duplicates on a subset of fields only, on the model of this hypothetical removeDupBy function?

foos = [ Foo "john" 32 "London", Foo "joe" 18 "New York", Foo "john" 22 "Paris", Foo "john" 32 "Madrid", Foo "joe" 17 "Los Angeles", Foo "joe" 18 "Berlin" ] > removeDupBy (\(Foo f) -> (fooName, fooAge)) foos [ Foo "john" 32 "London", Foo "joe" 18 "New York", Foo "john" 22 "Paris", Foo "joe" 17 "Los Angeles" ] 

I could implement my own but would prefer using one from a well-established library, which will probably be much more performant and be much more resilient against edge cases. I was thinking of using nub but I'm not sure how to map the actual Foo elements to the tuples (fooName, fooAge) that nub would filter out.

2

2 Answers 2

2

Since you are dealing with only strings and numbers, you can use the Ord instance to remove duplicates efficiently, or even Hashable, which allows practically constant-time lookups.

Some functions which exactly match your desired signature are:

  1. nubOrdOn from the containers package
Data.Containers.ListUtils> nubOrdOn (\f -> (fooName f, fooAge f)) foos 
  1. hashNubOn from the witherable package
Witherable> hashNubOn (\f -> (fooName f, fooAge f)) foos 

You may find other options by searching on Hoogle for (a -> b) -> [a] -> [a]

If you need to do many operations like this, you may prefer to use Map or HashMap directly.

Sign up to request clarification or add additional context in comments.

Comments

1

You can use nubBy:

Prelude Data.List> nubBy (\x y -> (fooName x, fooAge x) == (fooName y, fooAge y)) foos [Foo {fooName = "john", fooAge = 32, fooCity = "London"}, Foo {fooName = "joe", fooAge = 18, fooCity = "New York"}, Foo {fooName = "john", fooAge = 22, fooCity = "Paris"}, Foo {fooName = "joe", fooAge = 17, fooCity = "Los Angeles"}] 

(Output formatted for enhanced readability)

3 Comments

This algorithm is O(n^2), but it could be O(n log n) if the Ord or Hashable instances were used.
@4castle would you use Ord/Hashable with nubBy or would it need using nubOrdOn?
@Jivan It would have to be with a different function, like nubOrdOn, or you could use a Map/HashMap (insert all, and then read out the values)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.