3
$\begingroup$

I have a rather large dataset comprised of Association of Associations. For better or worse, I've converted this into a Dataset. Here is a simplified version of that dataset:

testdb = Dataset[<| "First" -> <| "LOCATION" -> GeoPosition[{40.1151, -88.2737}, "NAD27"], "TYPE" -> "A", "DATA1" -> Range[10], "DATA2" -> {0.8, 0.5, 0.2, 0.4, 0.5, 0.8, 0.75, 0.15, 0.95, 0.4}|>, "Second" -> <| "LOCATION" -> GeoPosition[{40.1123, -89.110}, "NAD27"], "TYPE" -> "B", "DATA1" -> Range[2, 11], "DATA2" -> {0.3, 0.2, 0.24, 0.44, 0.2, 0.81, 0.76, 0.72, 0.88, 0.44}|>, "Third" -> <| "LOCATION" -> GeoPosition[{40.1123, -89.110}, "NAD27"], "TYPE" -> "B", "DATA1" -> Range[4, 13], "DATA2" -> {0.66, 0.65, 0.21, 0.92, 0.51, 0.44, 0.23, 0.77, 0.85, 0.11}|>|>] 

enter image description here

My end goal is to have a resulting dataset of {"LOCATION","NEWDATA"} where NEWDATA is the total of "DATA2" for corresponding values of "DATA1" between 5 and 8 (5<=x<=8).

So for the example above, the result would be:

enter image description here

The true dataset is of length 824 with nested datasets of 20,000 elements, so speed in selecting and summing is needed. And while I could do this using Normal, Cases, and the like, my thought was that the Query method would be quicker.

$\endgroup$

1 Answer 1

7
$\begingroup$

How about:

testdb[ All, <| "LOCATION" -> "LOCATION", "NEWDATA" -> Total @* (Pick[#DATA2, Between[{5,8}] /@ #DATA1]&) |> ] 

enter image description here

If #DATA1/#DATA2 are very long, then you might want to use something like:

Pick[#DATA2, Unitize @ Clip[#DATA2, {5, 8}, {0, 0}], 1]& 

instead of

Pick[#DATA2, Between[{5,8}] /@ #DATA2]& 
$\endgroup$
1
  • $\begingroup$ Awesome. The second operator function is orders of magnitude faster for my case. $\endgroup$ Commented Jul 26, 2017 at 18:48

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.