Faster "stuttering" accumulate?

Question

Given some list of arbitrary precision numbers, e.g. (actual lists are 1M+ elements long):

test={0, 2, 2, 47839283, 2, 0, 0, 2, 0, 1, 2, 0}

I need to accumulate the list, but where a specified element value is treated as "clear", that is, the accumulation is reset to zero at that point in the list and continues.

With the above example, using zero as the "clear", the result would be e.g.:

{0, 2, 4, 47839287, 47839289, 0, 0, 2, 0, 1, 3, 0}

I'm using

Function[{lst, clr}, FoldList[If[#2 == clr, 0, +##] &, lst[[1]], Rest@lst]][test, 0]

Seems pretty snappy, wondering if there's a more efficient way of doing this.

FoldList is what I have used before. I doubt it can be beaten outside of a compiled solution, but as always I'd love to be proven wrong on such things. By the way it truly pains my terse-obsessed mind every time I see Fold[f, First[list], Rest[list]] or the equivalent written out, as I have customized Fold[f, list] to perform the same. Have you considered that? — Mr.Wizard
– Mr.Wizard, Commented May 5, 2014 at 8:39
@Mr.Wizard: I figured it might be close to optimal, but I'm sure you too have had the "have I missed something obvious..." feeling. Might you illuminate your comment re: terseness? — ciao
– ciao, Commented May 5, 2014 at 8:43
Please see my Fold customization here: (5433867). Edit: I just updated that post to match the current customization that I use. — Mr.Wizard
– Mr.Wizard, Commented May 5, 2014 at 8:48
For the sake of the experiment shall we consider the "clear" to occur frequently as shown, or rarely? — Mr.Wizard
– Mr.Wizard, Commented May 5, 2014 at 9:11
@mr: Frequency varies, from quite sparse (nearly non-existent) to fairly heavy (~25%), with no way ahead of time knowing sparsity. — ciao
– ciao, Commented May 5, 2014 at 20:43

Community · Accepted Answer · 2017-04-13 12:55:51Z

Notes:

I completely overlooked the fact that rasher specified arbitrary precision input when crafting my functions. Because of this the numeric methods I used will not be as fast. However, to my surprise it seems that at least fn2 is still faster than the FoldList method on arbitrary precision data.

Timings with resets on average at about 1 in 40:

big = RandomInteger[{-1*^20, 1*^20}, 1*^6]; big = Clip[big, {-∞, 0.95*^20}, {0, 0}]; fn1[big, 0] // timeAvg fn2[big] // timeAvg

0.858 0.499

Not quite the performance on packed arrays demonstrated below, but still faster than FoldList in a number of cases.

If the resets are fairly sparse it is possible to beat the FoldList method by accumulating, then setting first zeros in the original list to appropriate negative values based on that pass, then accumulating again. As a proof of concept implemented only for resetting on zeros:

fn2[a_List] := Module[{x = a, acc = Accumulate@a, pos}, pos = Split[#, # + 1 == #2 &][[All, 1]] & @ SparseArray[Unitize@x, Automatic, 1]["AdjacencyLists"]; x[[pos]] = -Differences @ Prepend[acc[[pos]], 0]; Accumulate @ x ]

Edit: Adopting a method similar to my answer from Find continuous sequences inside a list this method is faster than the OP on rather more frequent splits as well:

fn3[a_List] := Module[{x = a, acc = Accumulate@a, pos}, pos = SparseArray[#]["AdjacencyLists"] &[ Subtract[1, Unitize@x]* Prepend[Unitize@Differences@x, 1] ]; x[[pos]] = -Differences @ Prepend[acc[[pos]], 0]; Accumulate @ x ]

I named your function fn1. The timeAvg code has been posted many times.

On a list that resets on average one in ten elements:

big = RandomInteger[9, 1*^6]; Accumulate[big] // timeAvg fn1[big, 0] // timeAvg fn2[big] // timeAvg fn3[big] // timeAvg

0.004624 0.0656 0.0748 0.02808

Here fn3 is more than twice as fast as your original.
Now with resets on average one in one thousand elements:

big = RandomInteger[999, 1*^6]; (* other lines the same *)

0.004496 0.0718 0.01248 0.0156

Here both fn2 and fn3 are several times faster than the original, but now fn3 is slightly slower than fn2. With increasingly sparse resets fn2 continues to improve over fn3.

Just woke - I had pursued the very same idea of double accumulates, found it slower (at least on v9.x), will cut-n-paste your version and test. Yes, both fn2/fn3 considerably slower, regardless of sparsity. — ciao
– ciao, Commented May 5, 2014 at 20:46
+1 though for idea of course, appreciate the feedback as always! — ciao
– ciao, Commented May 5, 2014 at 20:52

wolfies · Accepted Answer · 2014-05-05 08:52:57Z

9

For brevity:

Map[Accumulate, Split[test, #2!=0&]] // Flatten

{0, 2, 4, 47839287, 47839289, 0, 0, 2, 0, 1, 3, 0}

... but not as fast as the OP :)

[ Thanks to MrWizard, changed SplitBy to Split which is more efficient ]

edited May 5, 2014 at 8:52

answered May 5, 2014 at 8:29

wolfies

8,8021 gold badge27 silver badges55 bronze badges

$\begingroup$ For what it's worth this is shorter and faster: Join @@ Accumulate /@ Split[test, #2 != 0 &] $\endgroup$

Mr.Wizard
– Mr.Wizard

2014-05-05 08:46:48 +00:00
Commented May 5, 2014 at 8:46
$\begingroup$ Join @@ is also faster than Flatten on packed arrays. :-) +1 by the way. $\endgroup$

Mr.Wizard
– Mr.Wizard

2014-05-05 08:56:40 +00:00
Commented May 5, 2014 at 8:56
8

$\begingroup$ @Mr.Wizard As they say: "If you can't flatten them, join them" $\endgroup$

wolfies
– wolfies

2014-05-05 09:04:11 +00:00
Commented May 5, 2014 at 9:04
$\begingroup$ Actually that doesn't matter because Split does not produce packed sublists! For some reason I falsely remembered that it did. $\endgroup$

Mr.Wizard
– Mr.Wizard

2014-05-05 09:07:44 +00:00
Commented May 5, 2014 at 9:07
$\begingroup$ Thanks for reply - this is basically same as my original method, as you noted, clean but gets slow on huge lists. +1 $\endgroup$

ciao
– ciao

2014-05-05 20:53:43 +00:00
Commented May 5, 2014 at 20:53

Add a comment |

Stack Exchange Network

Faster "stuttering" accumulate?

2 Answers 2

Notes:

Linked

Hot Network Questions

Faster "stuttering" accumulate?

2 Answers 2

Notes:

Linked

Related

Hot Network Questions