12
$\begingroup$

Given some list of arbitrary precision numbers, e.g. (actual lists are 1M+ elements long):

test={0, 2, 2, 47839283, 2, 0, 0, 2, 0, 1, 2, 0} 

I need to accumulate the list, but where a specified element value is treated as "clear", that is, the accumulation is reset to zero at that point in the list and continues.

With the above example, using zero as the "clear", the result would be e.g.:

{0, 2, 4, 47839287, 47839289, 0, 0, 2, 0, 1, 3, 0} 

I'm using

Function[{lst, clr}, FoldList[If[#2 == clr, 0, +##] &, lst[[1]], Rest@lst]][test, 0] 

Seems pretty snappy, wondering if there's a more efficient way of doing this.

$\endgroup$
5
  • $\begingroup$ FoldList is what I have used before. I doubt it can be beaten outside of a compiled solution, but as always I'd love to be proven wrong on such things. By the way it truly pains my terse-obsessed mind every time I see Fold[f, First[list], Rest[list]] or the equivalent written out, as I have customized Fold[f, list] to perform the same. Have you considered that? $\endgroup$ Commented May 5, 2014 at 8:39
  • $\begingroup$ @Mr.Wizard: I figured it might be close to optimal, but I'm sure you too have had the "have I missed something obvious..." feeling. Might you illuminate your comment re: terseness? $\endgroup$ Commented May 5, 2014 at 8:43
  • $\begingroup$ Please see my Fold customization here: (5433867). Edit: I just updated that post to match the current customization that I use. $\endgroup$ Commented May 5, 2014 at 8:48
  • 1
    $\begingroup$ For the sake of the experiment shall we consider the "clear" to occur frequently as shown, or rarely? $\endgroup$ Commented May 5, 2014 at 9:11
  • $\begingroup$ @mr: Frequency varies, from quite sparse (nearly non-existent) to fairly heavy (~25%), with no way ahead of time knowing sparsity. $\endgroup$ Commented May 5, 2014 at 20:43

2 Answers 2

12
$\begingroup$

Notes:

I completely overlooked the fact that rasher specified arbitrary precision input when crafting my functions. Because of this the numeric methods I used will not be as fast. However, to my surprise it seems that at least fn2 is still faster than the FoldList method on arbitrary precision data.

Timings with resets on average at about 1 in 40:

big = RandomInteger[{-1*^20, 1*^20}, 1*^6]; big = Clip[big, {-∞, 0.95*^20}, {0, 0}]; fn1[big, 0] // timeAvg fn2[big] // timeAvg 
0.858 0.499 

Not quite the performance on packed arrays demonstrated below, but still faster than FoldList in a number of cases.


If the resets are fairly sparse it is possible to beat the FoldList method by accumulating, then setting first zeros in the original list to appropriate negative values based on that pass, then accumulating again. As a proof of concept implemented only for resetting on zeros:

fn2[a_List] := Module[{x = a, acc = Accumulate@a, pos}, pos = Split[#, # + 1 == #2 &][[All, 1]] & @ SparseArray[Unitize@x, Automatic, 1]["AdjacencyLists"]; x[[pos]] = -Differences @ Prepend[acc[[pos]], 0]; Accumulate @ x ] 

Edit: Adopting a method similar to my answer from Find continuous sequences inside a list this method is faster than the OP on rather more frequent splits as well:

fn3[a_List] := Module[{x = a, acc = Accumulate@a, pos}, pos = SparseArray[#]["AdjacencyLists"] &[ Subtract[1, Unitize@x]* Prepend[Unitize@Differences@x, 1] ]; x[[pos]] = -Differences @ Prepend[acc[[pos]], 0]; Accumulate @ x ] 

I named your function fn1. The timeAvg code has been posted many times.

On a list that resets on average one in ten elements:

big = RandomInteger[9, 1*^6]; Accumulate[big] // timeAvg fn1[big, 0] // timeAvg fn2[big] // timeAvg fn3[big] // timeAvg 
0.004624 0.0656 0.0748 0.02808 

Here fn3 is more than twice as fast as your original.
Now with resets on average one in one thousand elements:

big = RandomInteger[999, 1*^6]; (* other lines the same *) 
0.004496 0.0718 0.01248 0.0156 

Here both fn2 and fn3 are several times faster than the original, but now fn3 is slightly slower than fn2. With increasingly sparse resets fn2 continues to improve over fn3.

$\endgroup$
3
  • $\begingroup$ Just woke - I had pursued the very same idea of double accumulates, found it slower (at least on v9.x), will cut-n-paste your version and test. Yes, both fn2/fn3 considerably slower, regardless of sparsity. $\endgroup$ Commented May 5, 2014 at 20:46
  • $\begingroup$ +1 though for idea of course, appreciate the feedback as always! $\endgroup$ Commented May 5, 2014 at 20:52
  • $\begingroup$ @rasher Sorry I couldn't be of help. :-/ $\endgroup$ Commented May 6, 2014 at 1:29
9
$\begingroup$

For brevity:

Map[Accumulate, Split[test, #2!=0&]] // Flatten 

{0, 2, 4, 47839287, 47839289, 0, 0, 2, 0, 1, 3, 0}

... but not as fast as the OP :)

[ Thanks to MrWizard, changed SplitBy to Split which is more efficient ]

$\endgroup$
5
  • $\begingroup$ For what it's worth this is shorter and faster: Join @@ Accumulate /@ Split[test, #2 != 0 &] $\endgroup$ Commented May 5, 2014 at 8:46
  • $\begingroup$ Join @@ is also faster than Flatten on packed arrays. :-) +1 by the way. $\endgroup$ Commented May 5, 2014 at 8:56
  • 8
    $\begingroup$ @Mr.Wizard As they say: "If you can't flatten them, join them" $\endgroup$ Commented May 5, 2014 at 9:04
  • $\begingroup$ Actually that doesn't matter because Split does not produce packed sublists! For some reason I falsely remembered that it did. $\endgroup$ Commented May 5, 2014 at 9:07
  • $\begingroup$ Thanks for reply - this is basically same as my original method, as you noted, clean but gets slow on huge lists. +1 $\endgroup$ Commented May 5, 2014 at 20:53

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.