4

I'm struggling to vectorize the repeated application of the following function, which I have currently implemented as a for loop. This small example is indicative of a problem with a larger dataset, for which vectorizing would allow for beneficial runtime improvements:

action = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) secondtask = mean(firsttask) thirdtask = min(firsttask[which(firsttask>secondtask)]) fourthtask = length(firsttask) output = list(firsttask, data.frame(average=secondtask, min_over_mean=thirdtask, size=fourthtask)) return(output) } thingofinterest = c(1:10) limits = c(5:10) test = vector("list", length = length(limits)) for(i in 1:length(limits)) { test[[i]] = action(thingofinterest, limits, i) } test 

I want to replace the for loop with a vectorized command, not any of the apply family of functions, as they do not always improve performance (I'm not suggesting there is anything wrong with for loops, I just need to optimize for speed in this case. See: Is R's apply family more than syntactic sugar?). How would I do this?

17
  • For loops can usually vectorized using apply or do.call. I think in this case, the latter would be most usefull. Commented Oct 9, 2015 at 11:44
  • I'm trying to avoid apply, as many functions in that family are just glorified for loops. do.call only outputs action(thingofinterest,5). Commented Oct 9, 2015 at 11:49
  • Why are you trying to avoid loops? There's nothing inherently wrong with implicit or explicit loops. Commented Oct 9, 2015 at 11:55
  • 2
    Two things you could easily improve: initialize the test vector to the correct size upfront using test = vector("list", length = length(limits)) to avoid growing the object in a loop (which makes it slow!) and in your action function you could compute mean(firsttask) before the second step and used the value stored in a variable Commented Oct 9, 2015 at 11:56
  • 1
    @NigelStackhouse *apply family are C implementated loops, calling the corresponding C implementation of base R function when applicable. So the statement "essentially masked for loops" is really wrong. There's nothing really wrong with for loops nor *apply calling custom complex functions when you avoid growing objects within them and repeating the same call again and again. Commented Oct 9, 2015 at 12:18

3 Answers 3

5

You need to understand where the bottlenecks are in your code before you start trying to change it to make it faster. For example:

timer <- function(action, thingofinterest, limits) { st <- system.time({ # for the wall time Rprof(interval=0.01) # Start R's profile timing for(j in 1:1000) { # 1000 function calls test = vector("list") for(i in 1:length(limits)) { test[[i]] = action(thingofinterest, limits, i) } } Rprof(NULL) # stop the profiler }) # return profiling results list(st, head(summaryRprof()$by.total)) } action = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) secondtask = min(firsttask[which(firsttask>mean(firsttask))]) thirdtask = mean(firsttask) fourthtask = length(firsttask) output = list(firsttask, data.frame(average=secondtask, min_over_mean=thirdtask, size=fourthtask)) return(output) } timer(action, 1:1000, 50:100) # [[1]] # user system elapsed # 9.720 0.012 9.737 # # [[2]] # total.time total.pct self.time self.pct # "system.time" 9.72 100.00 0.07 0.72 # "timer" 9.72 100.00 0.00 0.00 # "action" 9.65 99.28 0.24 2.47 # "data.frame" 8.53 87.76 0.84 8.64 # "as.data.frame" 5.50 56.58 0.44 4.53 # "force" 4.40 45.27 0.11 1.13 

You can see that very little time is spent outside the call to your action function. Now, for is a special primitive and is therefore not captured by the profiler, but the total time reported by the profiler is very similar to the wall time, so there can't be a lot of time missing from the profiler time.

And the thing that takes the most time in your action function is the call to data.frame. Remove that, and you get an enormous speedup.

action1 = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) secondtask = mean(firsttask) thirdtask = min(firsttask[which(firsttask>mean(firsttask))]) fourthtask = length(firsttask) list(task=firsttask, average=secondtask, min_over_mean=thirdtask, size=fourthtask) } timer(action1, 1:1000, 50:100) # [[1]] # user system elapsed # 1.020 0.000 1.021 # # [[2]] # total.time total.pct self.time self.pct # "system.time" 1.01 100.00 0.06 5.94 # "timer" 1.01 100.00 0.00 0.00 # "action" 0.95 94.06 0.17 16.83 # "which" 0.57 56.44 0.23 22.77 # "mean" 0.25 24.75 0.13 12.87 # "<" 0.20 19.80 0.20 19.80 

Now you can also get rid of one of the calls to mean and both calls to which.

action2 = function(x,y,i) { firsttask = cumsum(x[x < y[i]]) secondtask = mean(firsttask) thirdtask = min(firsttask[firsttask > secondtask]) fourthtask = length(firsttask) list(task=firsttask, average=secondtask, min_over_mean=thirdtask, size=fourthtask) } timer(action2, 1:1000, 50:100) # [[1]] # user system elapsed # 0.808 0.000 0.808 # # [[2]] # total.time total.pct self.time self.pct # "system.time" 0.80 100.00 0.12 15.00 # "timer" 0.80 100.00 0.00 0.00 # "action" 0.68 85.00 0.24 30.00 # "<" 0.20 25.00 0.20 25.00 # "mean" 0.13 16.25 0.08 10.00 # ">" 0.05 6.25 0.05 6.25 

Now you can see there's a "significant" amount of time spent doing stuff outside your action function. I put significant in quotes because it's 15% of the runtime, but only 120 milliseconds. If your actual code took ~12 hours to run, this new action function would finish in ~1 hour.

The results would be marginally better if I pre-allocated the test list outside of the for loop in the timer function, but the call to data.frame is the biggest time-consumer.

Sign up to request clarification or add additional context in comments.

4 Comments

So you're suggesting the same as I did except you removed which? (Btw, I read somewhere that subsetting by integer vector is slightly faster than by logical index?)
@docendodiscimus: I'm suggesting a process, and by using that process I made similar suggestions. I removed the two calls to which because they appeared as 50% of the run time. Subsetting by integer is faster than subsetting by logical, because the logical vector must be converted to its integer index locations before the object can be subset in the C code. That said, the call to which is not free, so you must evaluate the tradeoff between slower logical index subsetting and conversion to integer outside of the [ call.
Add three 0 to the thingofinterest (at least as OP is talking about a 13M length vector) and the call to which is clearly noticeable.
This was a great mini lesson in the value of evaluating my code in depth. The ~12 times speed increase is great.
3

Here's a little comparison in reference to my comment above. I've made the changes as in the comment (initialize test, change order in action and I removed the data.frame call in the list output of action if you can accept that):

library(microbenchmark) microbenchmark(f0(), f1()) Unit: microseconds expr min lq mean median uq max neval f0() 14042.192 14730.036 16091.508 15168.3175 16993.631 28193.770 100 f1() 894.555 928.738 1094.448 985.2865 1190.252 4710.675 100 

These modifications brought a ~15 times speed up.

Functions and data for comparison:

action0 = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) secondtask = min(firsttask[which(firsttask>mean(firsttask))]) thirdtask = mean(firsttask) fourthtask = length(firsttask) output = list(firsttask, data.frame(min_over_mean=secondtask, average=thirdtask, size=fourthtask)) return(output) } f0 <- function() { test = vector("list") for(i in 1:length(limits)) { test[[i]] = action0(thingofinterest, limits, i) } } thingofinterest = c(1:1000) limits = c(50:100) action1 = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) thirdtask = mean(firsttask) secondtask = min(firsttask[which(firsttask>thirdtask)]) fourthtask = length(firsttask) list(firsttask, min_over_mean=secondtask, average=thirdtask, size=fourthtask) } f1 <- function() { test = vector("list", length = length(limits)) for(i in 1:length(limits)) { test[[i]] = action1(thingofinterest, limits, i) } } 

4 Comments

@JoshuaUlrich, I know, that's because I removed the data.frame call as mentioned at the top.. I was assuming this might be acceptable for faster computation
Removing the data.frame call accounts for nearly all of that speedup.
I guess if you test with a larger data set, the initialization of test will gain importance. And generally, that might still be an interesting insight to the OP I guess
The prior initialization of test is interesting, although in my actual problem (not the example provided), I don't know the length of limits ahead of time (it depends on a number of other features). I'll see if I can work this into the reproducible example. Ultimately, I want to do away with the for loop entirely. Even with this speed up, applying it to my larger dataset still takes too long.
2

Just to add a comparison point with *apply familly I used this code (results verified with identical(f1(),f2()) f3 return a different layout).

After tests, the which call give some speed increase on large tingofinterest vector.

thingofinterest = c(1:100000) limits = c(50:1000) action1 = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) thirdtask = mean(firsttask) secondtask = min(firsttask[which(firsttask>thirdtask)]) fourthtask = length(firsttask) list(firsttask, min_over_mean=secondtask, average=thirdtask, size=fourthtask) } f1 <- function() { test = vector("list", length = length(limits)) for(i in 1:length(limits)) { test[[i]] = action1(thingofinterest, limits, i) } return(test) } action2 <- function(x,y) { firsttask = cumsum(x[which(x<y)]) thirdtask = mean(firsttask) secondtask = min(firsttask[which(firsttask>thirdtask)]) fourthtask = length(firsttask) list(firsttask, min_over_mean=secondtask, average=thirdtask, size=fourthtask) } f2 <- function() { test <- lapply(limits,action2,x=thingofinterest) return(test) } f3 <- function() { test <- sapply(limits,action2,x=thingofinterest) return(test) } 

For 1M thingofinterest and 950 limits here is the results on my machine:

> microbenchmark(f1(),f2(),f3(),times=10) Unit: seconds expr min lq mean median uq max neval f1() 4.303302 4.336767 4.373119 4.380383 4.403434 4.441945 10 f2() 4.267922 4.327208 4.450175 4.399422 4.423191 5.041011 10 f3() 4.240551 4.293855 4.412548 4.362949 4.468117 4.730717 10 

So a cleanly done for loop is not that bad in this case.

I've the feeling there's probably a data.table way to do the "action" work in one pass but it's out of my knowledge area for now.

More on the speed topic, I see no way to really vectorize it. Those vectors not being subsets of each others, their cumsum can't be "cut" to avoid computing the common sequences.

As you say in comments limits are usually between 90 and 110 entries, parallel processing could be a correct approach to compute each iteration on a different core as each iteration is independent of the others. (Thinkink of mclapply, but there's maybe others more adapted to your use case)

1 Comment

I was hoping for something like a data.table approach, but clearly that would be added effort for little return. Clearly, an efficient for loop works for the problem at hand just well, thanks for the thorough explanation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.