R: Adding Progress Bars in R

Question

I am using the R programming language. I am trying to learn how to add "progress bars" to estimate how much time is remaining while a function is running (https://www.rdocumentation.org/packages/progress/versions/1.2.2/topics/progress_bar).

For example:

library(progress) pb <- progress_bar$new(total = 100) for (i in 1:100) { pb$tick() Sys.sleep(1 / 100) }

Suppose I have a function called "grid_function" and a dataset called "DF_1". I am taking each individual row from "DF_1" and feeding this row into "grid_function". "grid_function" performs some calculations using this row, and stores it into a "list" called "resultdf1". Finally, "resultdf1" is converted into a data frame called "final_output". The "feeding process" can be seen below:

resultdf1 <- apply(DF_1,1, # 1 means rows FUN=function(x){ do.call( # Call Function grid_function with the arguments in # a list grid_function, # force list type for the arguments c(list(train_data_new), as.list( # make the row to a named vector unlist(x) ) )) } ) l = resultdf1 final_output = rbindlist(l, fill = TRUE)

Question: I would like to add a "progress bar" to the above code.

What I tried: I tried to do this as follows:

library(doParallel) library(future) cl <- makePSOCKcluster(6) # 6 cpu cores out of 8 registerDoParallel(cl) pb <- progress_bar$new(total = 100) for (i in 1:100) { resultdf1 <- apply(DF_1,1, # 1 means rows FUN=function(x){ do.call( # Call Function grid_function2 with the arguments in # a list grid_function, # force list type for the arguments c(list(train_data_new), as.list( # make the row to a named vector unlist(x) ) )) } ) l = resultdf1 final_output = rbindlist(l, fill = TRUE) pb$tick() Sys.sleep(1 / 100) } stopCluster(cl)

This appears to be working, but I am not sure if I did everything correctly. Can someone please tell me if I have done this correctly? Is there any chance that adding this "progress bar" will actually result in the function taking more time to run?

Thanks

UPDATE: @Serkan : Here is the full code:

library(dplyr) library(data.table) results_table <- data.frame() grid_function <- function(train_data, random_1, random_2, random_3, random_4, split_1, split_2, split_3) { #bin data according to random criteria train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c"))) train_data$cat = as.factor(train_data$cat) #new splits a_table = train_data %>% filter(cat == "a") %>% select(a1, b1, c1, cat) b_table = train_data %>% filter(cat == "b") %>% select(a1, b1, c1, cat) c_table = train_data %>% filter(cat == "c") %>% select(a1, b1, c1, cat) #calculate random quantile ("quant") for each bin table_a = data.frame(a_table%>% group_by(cat) %>% mutate(quant = quantile(c1, prob = split_1))) table_b = data.frame(b_table%>% group_by(cat) %>% mutate(quant = quantile(c1, prob = split_2))) table_c = data.frame(c_table%>% group_by(cat) %>% mutate(quant = quantile(c1, prob = split_3))) #create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1" table_a$diff = ifelse(table_a$quant > table_a$c1,1,0) table_b$diff = ifelse(table_b$quant > table_b$c1,1,0) table_c$diff = ifelse(table_c$quant > table_c$c1,1,0) #group all tables final_table = rbind(table_a, table_b, table_c) #create a table: for each bin, calculate the average of "diff" final_table_2 = data.frame(final_table %>% group_by(cat) %>% summarize( mean = mean(diff) )) #add "total mean" to this table final_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table$diff))) #format this table: add the random criteria to this table for reference final_table_2$random_1 = random_1 final_table_2$random_2 = random_2 final_table_2$random_3 = random_3 final_table_2$random_4 = random_4 final_table_2$split_1 = split_1 final_table_2$split_2 = split_2 final_table_2$split_3 = split_3 results_table <- rbind(results_table, final_table_2) final_results = dcast(setDT(results_table), random_1 + random_2 + random_3 + random_4 + split_1 + split_2 + split_3 ~ cat, value.var = 'mean') } # create some data for this example a1 = rnorm(1000,100,10) b1 = rnorm(1000,100,5) c1 = sample.int(1000, 1000, replace = TRUE) train_data = data.frame(a1,b1,c1) #grid random_1 <- seq(80,100,5) random_2 <- seq(85,120,5) random_3 <- seq(85,120,5) random_4 <- seq(90,120,5) split_1 = seq(0,1,0.1) split_2 = seq(0,1,0.1) split_3 = seq(0,1,0.1) DF_1 <- expand.grid(random_1 , random_2, random_3, random_4, split_1, split_2, split_3) #reduce the size of the grid for this example DF_1 = DF_1[1:100,] colnames(DF_1) <- c("random_1" , "random_2", "random_3", "random_4", "split_1", "split_2", "split_3") train_data_new <- copy(train_data) resultdf1 <- apply(DF_1,1, # 1 means rows FUN=function(x){ do.call( # Call Function grid_function2 with the arguments in # a list grid_function, # force list type for the arguments c(list(train_data_new), as.list( # make the row to a named vector unlist(x) ) )) } ) l = resultdf1 final_output = rbindlist(l, fill = TRUE)

And here is how the final output should look like:

head(final_output) random_1 random_2 random_3 random_4 split_1 split_2 split_3 b c total a 1: 80 85 85 90 0 0 0 0 0 0 NA 2: 85 85 85 90 0 0 0 0 0 0 NA 3: 90 85 85 90 0 0 0 0 0 0 NA 4: 95 85 85 90 0 0 0 0 0 0 NA 5: 100 85 85 90 0 0 0 0 0 0 NA 6: 80 90 85 90 0 0 0 0 0 0 NA

Is there any reason to believe you've done this incorrectly? "Does this look correct" isn't really a specific programming question that's a good fit for Stack Overflow. Either your code does what you want or it does not. If you are concerned with performance, start by profiling your own code: adv-r.hadley.nz/perf-measure.html. There are lots of ways to time code to see how long it's taking. You should be able to answer your own question about speed that way. — MrFlick
– MrFlick, Commented Jul 30, 2021 at 23:29
@ MrFlick : thank you for your reply! I am still learning about programming - many times my code will run without errors, but I am still very skeptical that it might not be running as intended. thanks again! — stats_noob
– stats_noob, Commented Jul 31, 2021 at 3:18

Serkan · Accepted Answer · 2021-07-31 04:25:30Z

3

I would opt for .progress = TRUE in furrr:future_map, an example R Script below;

library(furrr) library(magrittr) plan(multisession, workers = 2) 1:10 %>% future_map( ~ Sys.sleep(2), .progress = TRUE )

It adds a progress-bar automatically,

Progress: ─────────────────────────────────────── 100%

Without your data.frame and specific function, I cant replicate your problem to show you how it is implemented, but the apply-family are easily translated to the map-family of functions.

answered Jul 31, 2021 at 4:25

Serkan

1,9659 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

stats_noob Over a year ago

Thank you so much for your answer! I updated my question and included the full code start-to-finish. can you please show me how to implement this answer on my code? I am not sure where I would insert my code in your answer! thank you so much!

Serkan Over a year ago

I dont quite see it? Remember to include your desired output, and only a subset of you data. No need to run your full R Script

stats_noob Over a year ago

thank you for your reply! please check again. thank you so much!

stats_noob Over a year ago

@ Serkan: did you get a chance to look at it?

Serkan Over a year ago

Yes I did, but your code is too long - so I kinda lost track. See this stackoverflow.com/questions/68479928/…. Here I show how for-loop can be replaced by map. If it doesnt answer your question, then reduce your problem and mimick it, then I can show you as well :-)

Collectives™ on Stack Overflow

R: Adding Progress Bars in R

1 Answer 1

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Related