Skip to main content
9 of 10
improved formulation
Ray Shadow
  • 8k
  • 1
  • 17
  • 44

Cause of speed up

This is definitely not memoization. The reason for the observed speed up is that for large arrays (e.g. 10^8 elements), the memory clean up operations may take significant time. If one doesn't free memory, one can perform some operations faster.

Here is a simpler example:

Let's create a large array, then perform a calculation, and remove the array:

AbsoluteTiming[ Total[ConstantArray[0, 10^8]]; ] 

{0.422509, Null}

It takes 0.42 seconds. Let's now do the same thing, but keep the array in memory:

AbsoluteTiming[ Total[garbage = ConstantArray[0, 10^8]]; ] 

{0.366755, Null}

This evaluation is noticeably faster.

Let's check how long does it take to remove the large array:

AbsoluteTiming[ Remove[garbage] ] 

{0.061982, Null}

Note that 0.06 seconds is the difference of the calculation times above. This example shows that if we keep the large array instead of removing it, our code can run faster, because we don't need to spent time on freeing memory.

Your example

In the example you provide, removing the result of Unitize@data[[All, n]] from memory takes significant time. If one saves this array in a redundant variable, one avoids memory clean-up and wins some time.

How to make a representative test?

You should put Clear[pick, unitize] inside your timing function. This test will show that the pseudo-memoization technique is actually slower than built-in functions:

Table[ Clear[data]; data=RandomInteger[{0,10},{i*10^7,3}]; { Pick[data,Unitize@data[[All,-1]],1]; // AbsoluteTiming // First , Clear[pick,unitize]; unitize[x_]:=unitize[x]=Unitize[x]; pick[xs_,sel_,patt_]:=pick[xs,sel,patt]=Pick[xs,sel,patt]; pick[data,unitize@data[[All,-1]],1]; // AbsoluteTiming // First }, {i,5}] (* {{0.534744, 0.469538}, {1.03776, 1.05842}, {1.58536, 1.65404}, {2.10422, 2.11284}, {2.48129, 2.71405}} *) 

Technical note: as noted by Carl Woll in comments, if one wants to measure the symbol-removing-time using the following code:

In[1] := garbage = ConstantArray[0, 10^8]; In[2] := AbsoluteTiming[Remove[garbage]] 

one should set $HistoryLength to zero, otherwise the Out[1] variable will retain the contents of the large array. If Out[1] retains the large data, Remove[garbage] will only delete the reference, but not the data itself. Deletion time of a reference is almost zero, but it doesn't correspond to the deletion time for large data.

Ray Shadow
  • 8k
  • 1
  • 17
  • 44