You need to understand where the bottlenecks are in your code before you start trying to change it to make it faster. For example:
timer <- function(action, thingofinterest, limits) { st <- system.time({ # for the wall time Rprof(interval=0.01) # Start R's profile timing for(j in 1:1000) { # 1000 function calls test = vector("list") for(i in 1:length(limits)) { test[[i]] = action(thingofinterest, limits, i) } } Rprof(NULL) # stop the profiler }) # return profiling results list(st, head(summaryRprof()$by.total)) } action = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) secondtask = min(firsttask[which(firsttask>mean(firsttask))]) thirdtask = mean(firsttask) fourthtask = length(firsttask) output = list(firsttask, data.frame(average=secondtask, min_over_mean=thirdtask, size=fourthtask)) return(output) } timer(action, 1:1000, 50:100) # [[1]] # user system elapsed # 9.720 0.012 9.737 # # [[2]] # total.time total.pct self.time self.pct # "system.time" 9.72 100.00 0.07 0.72 # "timer" 9.72 100.00 0.00 0.00 # "action" 9.65 99.28 0.24 2.47 # "data.frame" 8.53 87.76 0.84 8.64 # "as.data.frame" 5.50 56.58 0.44 4.53 # "force" 4.40 45.27 0.11 1.13
You can see that very little time is spent outside the call to your action function. Now, for is a special primitive and is therefore not captured by the profiler, but the total time reported by the profiler is very similar to the wall time, so there can't be a lot of time missing from the profiler time.
And the thing that takes the most time in your action function is the call to data.frame. Remove that, and you get an enormous speedup.
action1 = function(x,y,i) { firsttask = cumsum(x[which(x<y[i])]) secondtask = mean(firsttask) thirdtask = min(firsttask[which(firsttask>mean(firsttask))]) fourthtask = length(firsttask) list(task=firsttask, average=secondtask, min_over_mean=thirdtask, size=fourthtask) } timer(action1, 1:1000, 50:100) # [[1]] # user system elapsed # 1.020 0.000 1.021 # # [[2]] # total.time total.pct self.time self.pct # "system.time" 1.01 100.00 0.06 5.94 # "timer" 1.01 100.00 0.00 0.00 # "action" 0.95 94.06 0.17 16.83 # "which" 0.57 56.44 0.23 22.77 # "mean" 0.25 24.75 0.13 12.87 # "<" 0.20 19.80 0.20 19.80
Now you can also get rid of one of the calls to mean and both calls to which.
action2 = function(x,y,i) { firsttask = cumsum(x[x < y[i]]) secondtask = mean(firsttask) thirdtask = min(firsttask[firsttask > secondtask]) fourthtask = length(firsttask) list(task=firsttask, average=secondtask, min_over_mean=thirdtask, size=fourthtask) } timer(action2, 1:1000, 50:100) # [[1]] # user system elapsed # 0.808 0.000 0.808 # # [[2]] # total.time total.pct self.time self.pct # "system.time" 0.80 100.00 0.12 15.00 # "timer" 0.80 100.00 0.00 0.00 # "action" 0.68 85.00 0.24 30.00 # "<" 0.20 25.00 0.20 25.00 # "mean" 0.13 16.25 0.08 10.00 # ">" 0.05 6.25 0.05 6.25
Now you can see there's a "significant" amount of time spent doing stuff outside your action function. I put significant in quotes because it's 15% of the runtime, but only 120 milliseconds. If your actual code took ~12 hours to run, this new action function would finish in ~1 hour.
The results would be marginally better if I pre-allocated the test list outside of the for loop in the timer function, but the call to data.frame is the biggest time-consumer.
applyordo.call. I think in this case, the latter would be most usefull.apply, as many functions in that family are just glorified for loops.do.callonly outputsaction(thingofinterest,5).testvector to the correct size upfront usingtest = vector("list", length = length(limits))to avoid growing the object in a loop (which makes it slow!) and in youractionfunction you could computemean(firsttask)before the second step and used the value stored in a variable