Time for another of these(1),(2) as yet another new-in-10 function appears to have poor performance compared to older alternatives. This time: Query appears to be orders of magnitude slower than Part when used for simple extraction.
For example:
Needs["GeneralUtilities`"] x = RandomInteger[99, 1*^6]; spans = Span @@@ Partition[Sort @ RandomInteger[{1, 1*^6}, 5000], 2, 1]; Do[x[[s]], {s, spans}] // AccurateTiming Do[Query[s][x], {s, spans}] // AccurateTiming 0.00447013 3.586205
Here Query is 800 times slower than Part. I know that Part is well optimized for packed arrays. Perhaps Query hasn't been similarly optimized yet. Let's try unpackable data:
x = "a" ~CharacterRange~ "z" ~RandomChoice~ 1*^6; Do[x[[s]], {s, spans}] // AccurateTiming Do[Query[s][x], {s, spans}] // AccurateTiming 0.0106673 3.594706
Alright, that seems to account for some of the difference as Part is only 337 times faster than Query here, but that is still a huge difference. The reason I was interested in Query is that by default it will not fail on out-of-bounds span ranges as Part will:
a = Range[9]; a[[5 ;; 11]] Query[5 ;; 11][a] Part::take: Cannot take positions 5 through 11 in {1,2,3,4,5,6,7,8,9}. >>
{1, 2, 3, 4, 5, 6, 7, 8, 9}[[5 ;; 11]] {5, 6, 7, 8, 9}
It also returns a Missing expression for a single part that is out-of-bounds:
Query[12][Range@9] Missing["PartAbsent", 12]
This feature is controlled by PartBehavior. Perhaps its overhead is high so let's turn it off and try again:
SetOptions[Query, PartBehavior -> None]; Do[Query[s][x], {s, spans}] // AccurateTiming 3.568204
Well that doesn't seem to be the case. Thankfully Szabolcs's post in Why is Dataset upset by division by zero? made me take a look at the other options for Query. With MissingBehavior set to Automatic Query will apply special rules for certain operators when there are Missing[] expressions:
Query[Total] @ {1, 2, 3, Missing[]} 6
This should have no role for Span however:
Query[2 ;; 4] @ {1, 2, 3, Missing[]} {2, 3, Missing[]}
Let's turn it off and time again just to be sure:
SetOptions[Query, MissingBehavior -> None]; Do[Query[s][x], {s, spans}] // AccurateTiming 0.355520
It seems we have found the cause of most of the slow-down, yet it makes no sense to me for this to have any effect as there is no special behavior to apply for Missing elements in a Span operation.
With FailureAction -> None this comes down to 0.285016 second, or ~27 times slower than Part on unpacked data.
Questions
Why would
MissingBehavioraffect the speed of aSpanoperation?Why is
Querystill many times slower thanPart, even with all special handling turned off?