I have some very large Datasets. Is there any efficient method of getting the positions of results found in a Query? That is, without using Normal and Position again on the results after the query returns.
Here’s an example
When querying 2D datasets (lists of assns) it can be (like with Pandas) easier and faster to only deal with vectors of indices when applying conditions, not the results themselves:
d = ExampleData[{"Dataset","Titanic"}] cases = d @ Query[Select[#sex=="male" && #class=="1st" && #age>70 &]] Position[Normal[d], #]& /@ Normal[cases] (* doing this is silly *) I don’t want to call Normal and Position after the fact (which might be prohibitively slow) but rather, I'm looking for Query to return only the indices of the results it finds.
Query acts like Cases, but there's no analog for Position, e.g. something like QueryIndexed or QueryPosition that gives the lists (i.e. part specifications) of the queried dataset contents.
d = MapIndexed[Append[#1, "index" -> #2] &, d]. This lets you do normalSelectquerying, and the index field just "comes along for the ride". However, I realize this isn't quite what you're looking for. $\endgroup$MapIndexedalso allows you to specify the level on which it's applied (including for nested associations) as an optional last argument, which would let you work with non-2D datasets!) $\endgroup$Datasetmaintains any internal row indices. In your case, preprocessing would be the "correct" solution, since you always need to extract the position. You might try putting creatingDatasetfrom theAssociation<|1 -> association-for-row-1-of-d, 2 -> association-for-row-2-of-d, ...|>. In any case, the problem you are describing seems to require a preprocessing step as a solution. Remember, even database indexes are a meta layer over the table. $\endgroup$