70

there are some informative posts on how to create a counter for loops in an R program. However, how do you create a similar function when using the parallelized version with "foreach()"?

3
  • 19
    Do you know how to accept answers on Stack Overflow? If not then please read the FAQ and go back over your previous questions. Commented Mar 24, 2011 at 19:20
  • There is an example of foreach in the ParallelR blog here and I think it's worth to read :) Commented Sep 26, 2016 at 12:13
  • For a solution that works nowadays, you can check out the doParabar package available on CRAN. There is also a tutorial here. It works with both built-in progress bars, and the progress package as well. As a disclaimer, I'm the author of parabar and doParabar. Commented Jan 23 at 10:11

7 Answers 7

72

Edit: After an update to the doSNOW package it has become quite simple to display a nice progress bar when using %dopar% and it works on Linux, Windows and OS X

doSNOW now officially supports progress bars via the .options.snow argument.

library(doSNOW) cl <- makeCluster(2) registerDoSNOW(cl) iterations <- 100 pb <- txtProgressBar(max = iterations, style = 3) progress <- function(n) setTxtProgressBar(pb, n) opts <- list(progress = progress) result <- foreach(i = 1:iterations, .combine = rbind, .options.snow = opts) %dopar% { s <- summary(rnorm(1e6))[3] return(s) } close(pb) stopCluster(cl) 

Yet another way of tracking progress, if you keep in mind the total number of iterations, is to set .verbose = T as this will print to the console which iterations have been finished.

Previous solution for Linux and OS X

On Ubuntu 14.04 (64 bit) and OS X (El Capitan) the progress bar is displayed even when using %dopar% if in the makeCluster function oufile = "" is set. It does not seem to work under Windows. From the help on makeCluster:

outfile: Where to direct the stdout and stderr connection output from the workers. "" indicates no redirection (which may only be useful for workers on the local machine). Defaults to ‘/dev/null’ (‘nul:’ on Windows).

Example code:

library(foreach) library(doSNOW) cl <- makeCluster(4, outfile="") # number of cores. Notice 'outfile' registerDoSNOW(cl) iterations <- 100 pb <- txtProgressBar(min = 1, max = iterations, style = 3) result <- foreach(i = 1:iterations, .combine = rbind) %dopar% { s <- summary(rnorm(1e6))[3] setTxtProgressBar(pb, i) return(s) } close(pb) stopCluster(cl) 

This is what the progress bar looks like. It looks a little odd since a new bar is printed for every progression of the bar and because a worker may lag a bit which causes the progress bar to go back and forth occasionally.

Sign up to request clarification or add additional context in comments.

7 Comments

A suggested improvement (I think it's sufficiently close to your idea not to warrant a separate answer): basically, write a newline to a tempfile with cat each iteration, then count the number of newlines (I use wc since I'm on Linux, but there are other solutions for Windows) and use this to update the progress bar. This has the advantage that it is monotonically increasing. Disadvantage is you have to read a file in every iteration -- not sure how slow this is.
Thanks for the suggestion @MichaelChirico, but by now there's an 'official' way of doing this. I've updated the answer.
I can't seem to get this to work from within a function.
The package doSNOW is superseded now.
@epsilone A progress bar will only display using doSNOW and is no more outdated than its successor doParallel.
|
13

You can also get this to work with the progress package.

what it looks like

# loading parallel and doSNOW package and creating cluster ---------------- library(parallel) library(doSNOW) numCores<-detectCores() cl <- makeCluster(numCores) registerDoSNOW(cl) # progress bar ------------------------------------------------------------ library(progress) iterations <- 100 # used for the foreach loop pb <- progress_bar$new( format = "letter = :letter [:bar] :elapsed | eta: :eta", total = iterations, # 100 width = 60) progress_letter <- rep(LETTERS[1:10], 10) # token reported in progress bar # allowing progress bar to be used in foreach ----------------------------- progress <- function(n){ pb$tick(tokens = list(letter = progress_letter[n])) } opts <- list(progress = progress) # foreach loop ------------------------------------------------------------ library(foreach) foreach(i = 1:iterations, .combine = rbind, .options.snow = opts) %dopar% { summary(rnorm(1e6))[3] } stopCluster(cl) 

4 Comments

But I do not know the number of iterations - because there is a nested loop within foreach and I have no clue how to count the iterations. Are these really required?
If you look at the help file for progress_bar, you can set total=NA although you no longer get a progress bar. I'm down to help you figure out a way to determine the number of iterations.
If I change the iterations to 10000 I get "Warning: progress function failed: invalid 'times' argument" how can I solve this?
If you only changed iterations to 10000 (assuming you are running the exact same code as above), the progress_letter variable needs to also be changed.
11

This code is a modified version of the doRedis example, and will make a progress bar even when using %dopar% with a parallel backend:

#Load Libraries library(foreach) library(utils) library(iterators) library(doParallel) library(snow) #Choose number of iterations n <- 1000 #Progress combine function f <- function(){ pb <- txtProgressBar(min=1, max=n-1,style=3) count <- 0 function(...) { count <<- count + length(list(...)) - 1 setTxtProgressBar(pb,count) Sys.sleep(0.01) flush.console() c(...) } } #Start a cluster cl <- makeCluster(4, type='SOCK') registerDoParallel(cl) # Run the loop in parallel k <- foreach(i = icount(n), .final=sum, .combine=f()) %dopar% { log2(i) } head(k) #Stop the cluster stopCluster(cl) 

You have to know the number of iterations and the combination function ahead of time.

6 Comments

Hmm, this is strange. My function seems to update the progress bar in one shot, after the actual calculations are done...
This method might only work with the doRedis backend. I'll have to investigate how to make it work with the doParallel backend.
It won't work well with doParallel because doParallel only calls the combine function after all of the results have been returned, since it is implemented by calling the parallel clusterApplyLB function. This technique only with works well with backends that call the combine function on-the-fly, like doRedis, doMPI, doNWS, and (defunct?) doSMP.
@Steve Weston thank you for the clarification. That makes a lot of sense to me, and now I understand why my function works on doRedis, but not doParallel.
You might try flushing the console... untested.
|
11

This is now possible with the parallel package. Tested with R 3.2.3 on OSX 10.11, running inside RStudio, using a "PSOCK"-type cluster.

library(doParallel) # default cluster type on my machine is "PSOCK", YMMV with other types cl <- parallel::makeCluster(4, outfile = "") registerDoParallel(cl) n <- 10000 pb <- txtProgressBar(0, n, style = 2) invisible(foreach(i = icount(n)) %dopar% { setTxtProgressBar(pb, i) }) stopCluster(cl) 

Strangely, it only displays correctly with style = 3.

5 Comments

R 3.2.2 on Windows 10 doesn't seem to produce any progress bar with this code... Is this specific to >= 3.2.3 ?
@IainS I'd sooner ascribe the difference to operating system inconsistency than the R version.
This seems to occasionally go down. It may not handle the asynchronous nature of the iterations (i = 15 could finish before i = 10).
This also doesn't work on R 4.3.0 and Windows 10, the progress is only displayed after the calculation is done.
Does not work on MacOS Sonoma 14.3
6

You save the start time with Sys.time() before the loop. Loop over rows or columns or something which you know the total of. Then, inside the loop you can calculate the time ran so far (see difftime), percentage complete, speed and estimated time left. Each process can print those progress lines with the message function. You'll get an output something like

1/1000 complete @ 1 items/s, ETA: 00:00:45 2/1000 complete @ 1 items/s, ETA: 00:00:44 

Obviously the looping order will greatly affect how well this works. Don't know about foreach but with multicore's mclapply you'd get good results using mc.preschedule=FALSE, which means that items are allocated to processes one-by-one in order as previous items complete.

3 Comments

are you using some sort of global counter, or are you relying on the index that's being looped over (i)?
@C8H10N4O2: The index looped over. With mclapply it gives good results with mc.preschedule=FALSE, and sometimes wrong, but usually close enough with the default (and usually faster) mc.preschedule=TRUE.
When I use message inside foreach (even when setting .verbose = TRUE, I get nothing in the console.
1

This code implements a progress bar tracking a parallelized foreach loop using the doMC backend, and using the excellent progress package in R. It assumes that all cores, specified by numCores, do an approximately equal amount of work.

library(foreach) library(doMC) library(progress) iterations <- 100 numCores <- 8 registerDoMC(cores=numCores) pbTracker <- function(pb,i,numCores) { if (i %% numCores == 0) { pb$tick() } } pb <- progress_bar$new( format <- " progress [:bar] :percent eta: :eta", total <- iterations / numCores, clear = FALSE, width= 60) output = foreach(i=1:iterations) %dopar% { pbTracker(pb,i,numCores) Sys.sleep(1/20) } 

4 Comments

If you actually register multiple cores, this doesn't work.
The above example seems to work as is on my MacBook Pro 2017, R v. 3.5.1. I believe one of the parellelism related packages above prevents multiple cores from kicking in if the actual work inside the loop is tiny. Try putting something more laborious inside the loop -it should work.
But the above isn't even registering the cores? I don't think it actually farms out the tasks. To be clear the above works for me, but when I actually register multiple workers, it only returns the completed tracker at the end. try adding registerDoMC(2) before the %dopar% call
@luke.sonnet, thanks for pointing out the missing line. After including registerDoMC(cores=numCores), I'm getting multiple cores firing up when I look at Activity Monitor on my Mac. To give you an idea, progress [====>-----------------------------] 15% eta: 12s, is what I'm seeing in the interim.
-4

The following code will produce a nice progress bar in R for the foreach control structure. It will also work with graphical progress bars by replacing txtProgressBar with the desired progress bar object.

# Gives us the foreach control structure. library(foreach) # Gives us the progress bar object. library(utils) # Some number of iterations to process. n <- 10000 # Create the progress bar. pb <- txtProgressBar(min = 1, max = n, style=3) # The foreach loop we are monitoring. This foreach loop will log2 all # the values from 1 to n and then sum the result. k <- foreach(i = icount(n), .final=sum, .combine=c) %do% { setTxtProgressBar(pb, i) log2(i) } # Close the progress bar. close(pb) 

While the code above answers your question in its most basic form a better and much harder question to answer is whether you can create an R progress bar which monitors the progress of a foreach statement when it is parallelized with %dopar%. Unfortunately I don't think it is possible to monitor the progress of a parallelized foreach in this way, but I would love for someone to prove me wrong, as it would be very useful feature.

1 Comment

This answer does not address the OP question in relation to parallelization, %dopar%

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.