15
$\begingroup$

I have a dataset ds with 3 columns named A, B, C. Columns A and B have repeated values. How can I obtain another dataset that contains a list of the values in C for each of the unique combinations of A and B?

For example,

ds = Dataset[ {<| "A" -> 2, "B" -> 3, "C" -> 100 |>, <| "A" -> 2, "B" -> 4, "C" -> 200 |>, <| "A" -> 2, "B" -> 3, "C" -> 300 |>}] 

I want to get

Dataset[ {<| "A" -> 2, "B" -> 3, "Clist" -> {100, 300} |>, <| "A" -> 2, "B" -> 4, "Clist" -> {200} |>}] 

How can I do that?

$\endgroup$
3

3 Answers 3

11
$\begingroup$

To get a list of C values for each A/B combination:

ds[GroupBy[{#A, #B}&] /* Values , <| "A" -> Query[First, "A"] , "B" -> Query[First, "B"] , "CList" -> Query[All, "C"] |> ] 

dataset screenshot

This is not as succinct as SQL's GROUP BY operator, but it does allow us to easily perform multiple aggregations if desired:

ds[GroupBy[{#A, #B}&] /* Values , <| "A" -> Query[First, "A"] , "B" -> Query[First, "B"] , "CList" -> Query[All, "C"] , "CMean" -> Query[Mean, "C"] , "CMin" -> Query[Min, "C"] , "CMax" -> Query[Max, "C"] |> ] 

dataset screenshot

$\endgroup$
1
  • 5
    $\begingroup$ I really think you should write a tutorial on Dataset[] at some point... I keep picking up new stuff from your answers on this topic. :) $\endgroup$ Commented Jul 26, 2016 at 14:44
8
$\begingroup$

Failed to find a duplicate:

by = {"A", "B"}; Values @ GroupBy[ds, Query[by], MapAt[First, List /@ by] @* Merge[Identity]] 

enter image description here

or:

ds // GroupBy[Query[{"A", "B"}] -> (#C &)] // KeyValueMap[<|#, "Clist" -> #2|> &] ds // GroupBy[Query[{"A", "B"}]] // KeyValueMap[<|#, "Clist" -> #2[[;; , "C"]]|> &] 
$\endgroup$
4
  • $\begingroup$ Thanks for an accept but it is always good to hold on a day or two. Let's not discourage others. $\endgroup$ Commented Jul 26, 2016 at 9:07
  • $\begingroup$ awesome, what if I wanted to "reduce" the resulting dataset by computing say the mean of the list instead? $\endgroup$ Commented Jul 26, 2016 at 9:07
  • $\begingroup$ @amrods Then take a look at the first topic linked in comments. It seems to be exactly that.Merge[Mean] is nice in a way that it will automatically reduce A and B lists to the repeating element. Here we have a more general case where I have to use First manually. $\endgroup$ Commented Jul 26, 2016 at 9:09
  • 3
    $\begingroup$ Thanks. I really thought there would be a cleaner way of achieving this, since I think is a fairly common operation on a dataset. $\endgroup$ Commented Jul 26, 2016 at 9:21
5
$\begingroup$

You may use GroupBy and Merge.

ds[GroupBy[{#"A", #"B"} &] /* Values, Merge[Identity] /* Query[{"A" -> First, "B" -> First}]] 

enter image description here

Hope this helps

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.