3
$\begingroup$

I have some very large Datasets. Is there any efficient method of getting the positions of results found in a Query? That is, without using Normal and Position again on the results after the query returns.

Here’s an example

When querying 2D datasets (lists of assns) it can be (like with Pandas) easier and faster to only deal with vectors of indices when applying conditions, not the results themselves:

d = ExampleData[{"Dataset","Titanic"}] cases = d @ Query[Select[#sex=="male" && #class=="1st" && #age>70 &]] Position[Normal[d], #]& /@ Normal[cases] (* doing this is silly *) 

I don’t want to call Normal and Position after the fact (which might be prohibitively slow) but rather, I'm looking for Query to return only the indices of the results it finds.

Query acts like Cases, but there's no analog for Position, e.g. something like QueryIndexed or QueryPosition that gives the lists (i.e. part specifications) of the queried dataset contents.

$\endgroup$
9
  • $\begingroup$ Can you give a specific example? $\endgroup$ Commented Apr 30, 2020 at 14:01
  • 1
    $\begingroup$ If it is an option, you might consider adding a row index to each row as a preprocessing step. $\endgroup$ Commented May 13, 2020 at 5:10
  • 1
    $\begingroup$ Your best bet might be to first modify your original dataset by adding an "index" field, as d = MapIndexed[Append[#1, "index" -> #2] &, d]. This lets you do normal Select querying, and the index field just "comes along for the ride". However, I realize this isn't quite what you're looking for. $\endgroup$ Commented May 13, 2020 at 5:19
  • 2
    $\begingroup$ (MapIndexed also allows you to specify the level on which it's applied (including for nested associations) as an optional last argument, which would let you work with non-2D datasets!) $\endgroup$ Commented May 13, 2020 at 5:23
  • 2
    $\begingroup$ @user5601 I don't believe Dataset maintains any internal row indices. In your case, preprocessing would be the "correct" solution, since you always need to extract the position. You might try putting creating Dataset from the Association <|1 -> association-for-row-1-of-d, 2 -> association-for-row-2-of-d, ...|>. In any case, the problem you are describing seems to require a preprocessing step as a solution. Remember, even database indexes are a meta layer over the table. $\endgroup$ Commented May 13, 2020 at 15:15

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.