15
$\begingroup$

What is the fastest way to make multiple selections from a list? Compiled methods included.

For example, here are two methods for selecting a subset, compared:-

biglist = {{5, "e", 500}, {4, "d", 400}, {3, "c", 300}, {2, "b", 200}, {1, "a", 100}}; subset = {2, 5, 4}; Cases[biglist, {#, __}] & /@ subset 

{{{2, "b", 200}}, {{5, "e", 500}}, {{4, "d", 400}}}

Cases[biglist, {Apply[Alternatives, subset], __}] 

{{5, "e", 500}, {4, "d", 400}, {2, "b", 200}}

Only first method returns items in the order matched in the subset list, but it is much slower:-

n = 10000; biggerlist = Map[{#, FromCharacterCode[Mod[# - 1, 26] + 97], #*100} &, Range[n]]; unsortedbiglist = RandomSample[biggerlist, n]; unsortedsubset = RandomSample[Range[n], Round[n/10]]; Row[{First[Timing[selection1 = Map[Cases[unsortedbiglist, {#, __}] &, unsortedsubset];]], " seconds"}] 

1.123 seconds

Row[{First[Timing[selection2 = Cases[unsortedbiglist, {Apply[Alternatives, unsortedsubset], __}];]], " seconds"}] 

0.171 seconds

The selections are the same, but differently ordered:-

SameQ[Flatten[selection1, 1], Extract[selection2, Flatten[Map[Position[First /@ selection2, #] &, unsortedsubset], 1]]] 

True

Including the sorting routine in the selection process still gives a better timing:-

Row[{First[Timing[selection3 = Function[selection2, Extract[selection2, Flatten[Map[Position[First /@ selection2, #] &, unsortedsubset], 1]]][Cases[unsortedbiglist, {Apply[Alternatives, unsortedsubset], __}]];]], " seconds"}] 

0.359 seconds

SameQ[Flatten[selection1, 1], selection3] 

True

Nevertheless, only the slow Cases method returns {} when there are unmatched subset items, which is sometimes useful.

Ideas for speedy selections would be great.

$\endgroup$

3 Answers 3

15
$\begingroup$

This seems to give a rather decent performance (final version with improvements by jVincent):

Clear[getSubset]; getSubset[input_List,sub_List]:= Module[{inSubQ,sowMatches}, Scan[(inSubQ[#] := True)&,sub]; sowMatches[x_/;inSubQ@First@x] := Sow[x,First@x]; Apply[Sequence, Last@Reap[Scan[sowMatches, input], sub], {2}] ]; 

Benchmarks:

n = 10000; biggerlist = Map[{#, FromCharacterCode[Mod[# - 1, 26] + 97], #*100} &, Range[n]]; unsortedbiglist = RandomSample[biggerlist, n]; unsortedsubset = RandomSample[Range[n], Round[n/10]]; Row[{First[Timing[selection1=Map[Cases[unsortedbiglist,{#,__}]&,unsortedsubset];]]," seconds"}] (* 1.170008 seconds *) (sel1 = getSubset[unsortedbiglist,unsortedsubset])//Short//Timing (* {0.031,{{{8286,r,828600}},<<998>>,{{6420,x,642000}}}} *) selection1===sel1 (* True *) 
$\endgroup$
4
  • $\begingroup$ I updated the code and reran the test and timing. Feel free to revert if it's not to your liking. $\endgroup$ Commented Feb 7, 2013 at 13:37
  • $\begingroup$ @jVincent Thanks, looks good to me. $\endgroup$ Commented Feb 7, 2013 at 13:44
  • $\begingroup$ Very nice. Also, adding ... sub] /. {} -> {{}} to the last line makes it handle non-matching sublist items the same as Cases. $\endgroup$ Commented Feb 7, 2013 at 14:19
  • $\begingroup$ @ChrisDegnen Great. Feel free to update the code in the answer if you feel this change is important. $\endgroup$ Commented Feb 7, 2013 at 14:27
7
$\begingroup$
 Timing[selection3 = Pick[unsortedbiglist, unsortedbiglist[[All, 1]], Alternatives @@ unsortedsubset];] (* {0.218401, Null} -- same as Cases[..., Alternatives@@ ..] *) selection3 == selection2 (* True *) 
$\endgroup$
1
  • $\begingroup$ Yes, same result and timing. $\endgroup$ Commented Feb 7, 2013 at 12:41
6
$\begingroup$

Here is my take on Leonid's method. It's better because it's shorter and uses ~infix~. ;-)
(It's just a little bit faster, too: about 20% on his test.)

getSubset2[input_List, sub_List] := Module[{test}, (test@# = True) & ~Scan~ sub; Apply[Sequence, Reap[Cases[input, x:{y_?test, ___} :> x ~Sow~ y], sub][[2]], {2} ] ] getSubset2[Range@20 ~Partition~ 4, Prime ~Array~ 7] 
{{}, {}, {{5, 6, 7, 8}}, {}, {}, {{13, 14, 15, 16}}, {{17, 18, 19, 20}}} 

Although slower I cannot pass by the more direct implementation without comment:

getSubset3[input_List, sub_List] := Last @ Reap[# ~Sow~ #[[1]] & ~Scan~ input, sub, Sequence @@ #2 &] 

Also slower than getSubset2 but pleasingly clean, Association can be nicely applied to this problem in the form of GroupBy and Lookup.

getSubset4[set_, sub_] := Lookup[set ~GroupBy~ First, sub, {}] getSubset4[Range@20 ~Partition~ 4, Prime ~Array~ 7] 
{{}, {}, {{5, 6, 7, 8}}, {}, {}, {{13, 14, 15, 16}}, {{17, 18, 19, 20}}} 
$\endgroup$
4
  • $\begingroup$ Thanks, I was wondering how Sequence could be used. $\endgroup$ Commented Feb 11, 2013 at 13:02
  • $\begingroup$ This is so useful! Apply[Sequence, {{}, {{}}}, {2}] $\endgroup$ Commented Feb 28, 2013 at 11:21
  • $\begingroup$ @Chris I'm happy to have planted the seed. :-) $\endgroup$ Commented Feb 28, 2013 at 11:23
  • $\begingroup$ @Chris Apply[Sequence, lis, {-3}] is a solution to this question that I don't believe anyone has posted. You should. $\endgroup$ Commented Feb 28, 2013 at 11:28

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.