10
$\begingroup$

Is there a faster way to find the position of certain elements (referred to as target below) from a list (referred to as myList below)?
In my case, myList is a large list with around 200,000 sublists, each containing approximately 10 elements.
The 'target' list consists of around 400,000 elements.
I ran the code Map[First, Position[myList, #]] & /@ target, but it was very slow, and I estimate it would take more than 500 hours to complete.
I checked the PositionIndex function, but it doesn't work in this case. Below is the example code using much smaller data.

myList = {{{1, 2, 3, 4, 5, 6, 7, 8}, {9, 10, 11, 12, 13, 14, 15, 16}, {17, 18, 19, 20, 21, 22, 23, 24}, {25, 26, 27, 28, 29, 30, 31, 32}, {33, 34, 35, 36, 37, 38, 39, 40}, {41, 42, 43, 44, 45, 46, 47, 48}, {49, 50, 51, 52, 53, 54, 55, 56}, {57, 58, 59, 60, 61, 62, 63, 64}, {65, 66, 67, 68, 69, 70, 71, 72}, {73, 74, 75, 76, 77, 78, 79, 80}}, {{1, 2, 3, 4, 5, 6, 7, 8}, {9, 10, 11, 12, 13, 14, 15, 16}, {17, 18, 19, 20, 21, 22, 23, 24}, {25, 26, 27, 28, 29, 30, 31, 32}, {33, 34, 35, 36, 37, 38, 39, 40}, {41, 42, 43, 44, 45, 46, 47, 48}, {49, 50, 51, 52, 53, 54, 55, 56}, {57, 58, 59, 60, 61, 62, 63, 64}, {65, 66, 67, 68, 69, 70, 71, 72}, {73, 74, 75, 76, 77, 78, 79, 80}}, {{1, 2, 3, 4, 5, 6, 7, 8}, {9, 10, 11, 12, 13, 14, 15, 16}, {17, 18, 19, 20, 21, 22, 23, 24}, {25, 26, 27, 28, 29, 30, 31, 32}, {33, 34, 35, 36, 37, 38, 39, 40}, {41, 42, 43, 44, 45, 46, 47, 48}, {49, 50, 51, 52, 53, 54, 55, 56}, {57, 58, 59, 60, 61, 62, 63, 64}, {65, 66, 67, 68, 69, 70, 71, 72}, {73, 74, 75, 76, 77, 78, 79, 80}}}; target = {1, 5, 8, 9, 23, 58}; Map[First, Position[myList, #]] & /@ target 
$\endgroup$
9
  • $\begingroup$ Try this on your large list: Compile[{{myList,_Integer,3}, {target, _Integer, 1}},Map[First, Position[myList, #]] & /@ target][myList,target] I also tried FunctionCompile, but could not get it to work. Looks like FuncionCompile is still quit restricted, unfortunately. $\endgroup$ Commented Sep 24, 2024 at 12:13
  • 3
    $\begingroup$ Dimensions[myList] is {3, 10, 8}. Does your real data also have such kind of structure? $\endgroup$ Commented Sep 24, 2024 at 12:14
  • $\begingroup$ Your code evaluets to {{1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}} ?? $\endgroup$ Commented Sep 24, 2024 at 12:36
  • $\begingroup$ I'm not sure if I understand correctly, but does this work? Block[{c = Merge[MapIndexed[AssociationThread[Catenate[#1] -> First[#2]] &, myList], Identity]}, Lookup[c, target] ] $\endgroup$ Commented Sep 24, 2024 at 16:02
  • 2
    $\begingroup$ Are the elements all positive, machine integers? Is it a rectangular array? Is it packed? $\endgroup$ Commented Sep 24, 2024 at 16:31

5 Answers 5

16
$\begingroup$

The problem with search elements of target in the expression myList is that naively we have to traverse the whole expression myList for each element of target. So the complexity is roughly Length[target] Length[Flatten[myList].

Instead, we can traverse myList just once and build some kind of index. That is also what PositionIndex does, but the latter cannot handle nestedness. So let's build one on our own. The aim is to create an Association idx, so that for a value t in target the outcome of idx[t] is a list of all positions of t in myList (in the order of appearance). We use an Association because this data structure allows for a fast lookup. (IIRC, Association uses a clever search tree data structure; its complexity for a single lookup should be something like Log[n], where n is the number of distinct entries in myList.

We can build idx by the following (not overly optimized) code:

idx = Association[]; Module[{a, b, c, cidx}, Do[ a = myList[[i]]; Do[ b = a[[j]]; Do[ c = b[[k]]; cidx = Lookup[idx, c, Association[]]; idx[c] = AssociateTo[cidx, Length[cidx] + 1 -> {i, j, k}]; , {k, 1, Length[b]}] , {j, 1, Length[a]} ] , {i, 1, Length[myList]}] ]; idx = Values /@ idx; 

The first crucial line here is

cidx = Lookup[idx, c, Association[]]; 

It is like a lookup idx[c] with the added feature that it will return Association[] if c does not already show up in the list of keys of idx.

The second crucial line is

idx[c] = AssociateTo[cidx, Length[cidx] + 1 -> {i, j, k}]; 

The triple {i, j, k} is the position of the current element. So all we have to do is to append it in idx[c], which we do by AssociateTo. (We could make idx[c] also a List and use Append, but that is probably not very efficient. (It was definitely not efficient in older versions of Mathematica.)

The line

idx = Values /@ idx; 

is to convert the lowest level of Associations back to Lists.

Now, if you want to know all positions where t shows up, you just evaluate idx[t]. It should be equivalent to Position[myList, t] If t is not present in myList, you will get an expression with head Missing, instead. For example, this is the return for the data from you example:

idx[1] 

{{1, 1, 1}, {2, 1, 1}, {3, 1, 1}}.

If you want to look up many target values all at once, then Lookup is your friend. You can also specify what shall be returned when a target value is not found. For example, the line below returns a {} in this case. (But in you example, all target values are found.)

Lookup[idx, target, {}] 

{{{1, 1, 1}, {2, 1, 1}, {3, 1, 1}}, {{1, 1, 5}, {2, 1, 5}, {3, 1, 5}}, {{1, 1, 8}, {2, 1, 8}, {3, 1, 8}}, {{1, 2, 1}, {2, 2, 1}, {3, 2, 1}}, {{1, 3, 7}, {2, 3, 7}, {3, 3, 7}}, {{1, 8, 2}, {2, 8, 2}, {3, 8, 2}}}

So your desired output should be obtained by

Lookup[idx, target, {}][[All, All, 1]] 

{{1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}}

This lookup should take somewhat like Length[target] * Log[n], where n is the number Length[DeleteDuplicates[Flatten[myList]]] of distinct numbers in myList. In particular, if there are many duplicates, this is way faster than Length[target] * Length[Flatten[myList]] for the naive search. The cost for building idx should be in the order of Length[Flatten[myList]] * Log[n], if I am not mistaken.

Faster implementation

In the meantime I found a better way to create the index idx with GroupBy. For a list of 200000 matrices of $10 \times 10$ matrices it needs about 17 seconds to build the index. Compared to that, the lookup time is totally negligible:

maxval = 100000; n = 200000; m = 400000; myList = RandomInteger[{1, maxval}, {n, 10, 10}]; target = RandomInteger[{1, maxval}, {m}]; idx = GroupBy[ Transpose[{ Flatten[myList], Flatten[ Table[ {i, j, k} , {i, 1, Length[myList]} , {j, 1, Length[myList[[i]]]} , {k, 1, Length[myList[[i, j]]]} ], 2] }], First -> Last ]; // AbsoluteTiming // First result = Lookup[idx, target, {}]; // AbsoluteTiming // First 

17.2046

0.114277

$\endgroup$
8
$\begingroup$

Another way that traverses each list just one time is to create a test for membership in target and run that over myList.

Scan[(hotItem[#] = True) &, target]; Reap[MapIndexed[If[TrueQ[hotItem[#]], Sow[#2]] &, myList, {3}]][[2, 1]] (* Out[32]= {{1, 1, 1}, {1, 1, 5}, {1, 1, 8}, {1, 2, 1}, {1, 3, 7}, {1, 8, 2}, {2, 1, 1}, {2, 1, 5}, {2, 1, 8}, {2, 2, 1}, {2, 3, 7}, {2, 8, 2}, {3, 1, 1}, {3, 1, 5}, {3, 1, 8}, {3, 2, 1}, {3, 3, 7}, {3, 8, 2}} *) 

This does not give the result in the same order as the method shown in the original post though.

--- edit ---

As requested, here is an example with timing. I use random integers to make sure we have hits (with high probability, that is).

myList = RandomInteger[{1, 10^7}, {200000, 10}]; target = RandomInteger[{1, 10^7}, {400000}]; AbsoluteTiming[ Scan[(hotItem[#] = True) &, target]; res = Reap[ MapIndexed[If[TrueQ[hotItem[#]], Sow[#2]] &, myList, {ArrayDepth[myList]}]][[2]]; Clear[hotItem]; If[res =!= {}, res = First[res]]; Length[res]] (* Out[24]= {2.17056, 78488} *) 

--- end edit ---

$\endgroup$
6
$\begingroup$

Here is a simple method that appears to be quite efficient. We replace target values with True and then use Position to find those True elements. The replacement is fast when Dispatch is used.

Here is a (reused) example.

myList = RandomInteger[{1, 10^7}, {200000, 10}]; target = RandomInteger[{1, 10^7}, {400000}]; AbsoluteTiming[ rule = Dispatch[Thread[target -> True]]; res = Position[myList /. rule, True]; Length[res]] (* Out[26]= {0.834693, 78488} *) 
$\endgroup$
4
  • $\begingroup$ Hm. This gives you all positions of all elements from target in myList, but it does not tell where a specific element of target is found, or does it? $\endgroup$ Commented Sep 25, 2024 at 19:14
  • $\begingroup$ @HenrikSchumacher It does not. It was not clear to me that that would be a requirement. I can modify suitably if it is, though it would not run quite so fast. $\endgroup$ Commented Sep 25, 2024 at 19:24
  • 1
    $\begingroup$ Not so important. I just observed that there is a lot of guessing involved in this post and that OP is quite silent about it. $\endgroup$ Commented Sep 25, 2024 at 19:57
  • 3
    $\begingroup$ @HenrikSchumacher I had noticed that too. An almost eerie silence. Suitable for a modern-noir movie, perhaps. $\endgroup$ Commented Sep 26, 2024 at 1:14
3
$\begingroup$

Sometimes the speed of arithmetic trumps poor algorithm complexity.

Assuming the best per the verbal description in the OP:

  • "Elements" are machine integers.
  • myList: 200000 x 10 packed array of integers per the description in the OP. (Not sure why the example has depth 3.)
  • target: packed array of 400000 integers.

Map[Flatten@SparseArray[ Total[ Subtract[1, Unitize@Subtract[#, myList]], {2, ArrayDepth[myList] + 1}] ]["NonzeroPositions"] &, target] 
(* {{1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}} *) 

Is 25 minutes sufficiently performant?:

myList = RandomReal[{1, 10^7}, {200000, 10}]; target = RandomReal[{1, 10^7}, {400000}]; Map[Flatten@SparseArray[ Total[ Subtract[1, Unitize@Subtract[#, myList]], {2, ArrayDepth[myList] + 1}] ]["NonzeroPositions"] &, target]; // AbsoluteTiming 
(* {1543.14, Null} ~25 min. *) 
$\endgroup$
3
  • 3
    $\begingroup$ I'm getting a 3 seconds on an example of this size, using the code I showed. $\endgroup$ Commented Sep 24, 2024 at 19:54
  • 3
    $\begingroup$ @DanielLichtblau Thanks for the timing. 25 minutes seemed faster than the OP's estimate of 500 hours, and no one else timed their code. Plus the OP didn't give a reasonable example on which to test performance, which seems odd considering the performance-tuning tag. If the OP ever gives an accurate description of their data, consider adding your timing to your answer. It would make it easier for users to compare methods. $\endgroup$ Commented Sep 25, 2024 at 0:56
  • $\begingroup$ Example added. But now I have a better method... $\endgroup$ Commented Sep 25, 2024 at 18:49
1
$\begingroup$

Here is a neat solution from Simon Woods : StackExchange 2013

makePositionFunction[f_Symbol, data_, level_: {-1}] := Block[{}, ClearAll[f]; Reap[ MapIndexed[Sow[#2, #1] &, data, level, Heads -> True], _, (f[#] = #2) &]; f[other_] := Position[data, other, level]] (* positions are collected in a function variable, 'pos' *) Clear[pos] makePositionFunction[pos, myList] (* read positions from pos *) positions = pos /@ target 

{{{1, 1, 1}, {2, 1, 1}, {3, 1, 1}}, . . . , {{1, 8, 2}, {2, 8, 2}, {3, 8, 2}}

positions[[All, All, 1]] 

{{1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {1, 2, 3}}

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.