5

This is related to R: use the newly generated data in the previous row

I realized the actual problem I was faced with is a bit more complicated than the example I gave in the thread above - it seems I have to pass 3 arguments to the recursive calculation to achieve what I want. Thus, accumulate2 or reduce may not work. So I open a new question here to avoid possible confusion.

I have the following dataset grouped by ID:

ID <- c(1, 2, 2, 3, 3, 3) pw <- c(1:6) add <- c(1, 2, 3, 5, 7, 8) x <- c(1, 2, NA, 4, NA, NA) df <- data.frame(ID, pw, add, x) df ID pw add x 1 1 1 1 1 2 2 2 2 2 3 2 3 3 NA 4 3 4 5 4 5 3 5 7 NA 6 3 6 8 NA 

Within each group for column x, I want to keep the value of the first row as it is, while fill in the remaining rows with lagged values raised to the power stored in pw, and add to the exponent the value in add. I want to update the lagged values as I proceed. So I would like to have:

 ID pw add x 1 1 1 1 1 2 2 2 2 2 3 2 3 3 2^3 + 3 4 3 4 5 4 5 3 5 7 4^5 + 7 6 3 6 8 (4^5 + 7)^6 + 8 

I have to apply this calculation to a large dataset, so it would be perfect if there is a fast way to do this!

3
  • 1
    do.call(rbind, Reduce(function(x, y)if(is.na(y$x))modifyList(x, list(x=x$x^y$pw+y$add)) else y,split(df, seq(nrow(df))), accumulate = TRUE)) Commented Apr 19, 2021 at 1:59
  • 1
    Though akrun has showed that how to solve it correctly here, still I once passed more than 2 arguments in accumulate. See this question if you want to know it how. Nevertheless, it can be solved by a for loop also. Commented Apr 19, 2021 at 6:05
  • 1
    @AnilGoyal Thank you very much! It is extremely helpful. Commented Apr 20, 2021 at 6:22

3 Answers 3

6

If we want to use accumulate2, then specify the arguments correctly i.e. it takes two input arguments as 'pw' and 'add' and an initialization argument which would be the first value of 'x'. As it is a grouped by 'ID', do the grouping before we do the accumulate2, extract the lambda default arguments ..1, ..2 and ..3 respectively in that order and create the recursive function based on this

library(dplyr) library(purrr) out <- df %>% group_by(ID) %>% mutate(x1 = accumulate2(pw[-1], add[-1], ~ ..1^..2 + ..3, .init = first(x)) %>% flatten_dbl ) %>% ungroup out$x1 #[1] 1 2 11 #[4] 4 1031 1201024845477409792 

With more than 3 arguments, a for loop would be better

# // initialize an empty vector out <- c() # // loop over the `unique` ID for(id in unique(df$ID)) { # // create a temporary subset of data based on that id tmp_df <- subset(df, ID == id) # // initialize a temporary storage output tmp_out <- numeric(nrow(tmp_df)) # // initialize first value with the first element of x tmp_out[1] <- tmp_df$x[1] # // if the number of rows is greater than 1 if(nrow(tmp_df) > 1) { // loop over the rows for(i in 2:nrow(tmp_df)) { #// do the recursive calculation and update tmp_out[i] <- tmp_out[i - 1]^ tmp_df$pw[i] + tmp_df$add[i] } } out <- c(out, tmp_out) } out #[1] 1 2 11 #[4] 4 1031 1201024845477409792 
Sign up to request clarification or add additional context in comments.

6 Comments

@AnoushiravanR it is because of the way the calculation was carrried out. If you check my for loop tmp_out[i] <- tmp_out[i - 1]^ tmp_df$pw[i] + tmp_df$add[i] it is calculating the power on the previous value of 'x', when the first value of 'x' is already initialized. we are going to use from the second value for recursion
@AnoushiravanR it is a bit tricky with accumulate to understand the concept. I would use for loop for recursion as it is much more easier to understand and is flexible with n number of inputs
I understand as I've just started to see how I could use it for a problems like this. Thank you for your explanation, guess I have to first read the documentation.
@AnoushiravanR I think in coding, it is better to get mistakes which will trigger for understanding the why and probably will never forget when you make a mistake and corrected
I can't agree more with you. In particular when you are unable to answer a question, you can check other contributor's codes and learn from them. Because sometimes that solution doesn't even exist in my mind no matter how much I spent time on it.
|
6

In base R we could use the following solution for more than two arguments.

  • In this solution I first subset the original data set on ID values
  • Then I chose row id values through seq_len(nrow(tmp))[-1] omitting the first row id since it was provided by init
  • In anonymous function I used in Reduce, b argument represents accumulated/ previous value starting from init and c represents new/current values of our vector which is row numbers
  • So in every iteration our previous value (starting from init) will be raised to the power of new value from pw and will be summed by new value from add
cbind(df[-length(df)], unlist(lapply(unique(df$ID), function(a) { tmp <- subset(df, df$ID == a) Reduce(function(b, c) { b ^ tmp$pw[c] + tmp$add[c] }, init = tmp$x[1], seq_len(nrow(tmp))[-1], accumulate = TRUE) }))) |> setNames(c(names(df))) ID pw add x 1 1 1 1 1.000000e+00 2 2 2 2 2.000000e+00 3 2 3 3 1.100000e+01 4 3 4 5 4.000000e+00 5 3 5 7 1.031000e+03 6 3 6 8 1.201025e+18 

Data

structure(list(ID = c(1, 2, 2, 3, 3, 3), pw = 1:6, add = c(1, 2, 3, 5, 7, 8), x = c(1, 2, NA, 4, NA, NA)), class = "data.frame", row.names = c(NA, -6L)) 

1 Comment

fantastic and elegant as well. +1 already
2

Base R, not using Reduce() but rather a while() Loop:

# Split-apply-combine while loop: res => data.frame res <- do.call(rbind, lapply(with(df, split(df, ID)), function(y){ # While there are any NAs in x: while(any(is.na(y$x))){ # Store the index of the first NA value: idx => integer scalar idx <- with(y, head(which(is.na(x)), 1)) # Calculate x at that index using the business rule provided: # x => numeric vector y$x[idx] <- with(y, x[(idx-1)] ** pw[idx] + add[idx]) } # Explicitly define the return object: y => GlobalEnv y } ) ) 

OR recursive function:

# Recursive function: estimation_func => function() estimation_func <- function(value_vec, exponent_vec, add_vec){ # Specify the termination condition; when all elements # of value_vec are no longer NA: if(all(!(is.na(value_vec)))){ # Return value_vec: numeric vector => GlobalEnv return(value_vec) # Otherwise recursively apply the below: }else{ # Store the index of the first na value: idx => integer vector idx <- Position(is.na, value_vec) # Calculate the value of the value_vec at that index; # using the provided business logic: value_vec => numeric vector value_vec[idx] <- (value_vec[(idx-1)] ** exponent_vec[idx]) + add_vec[idx] # Recursively apply function: function => Local Env return(estimation_func(value_vec, exponent_vec, add_vec)) } } # Split data.frame into a list on ID; # Overwrite x values, applying recursive function; # Combine list into a data.frame # res => data.frame res <- data.frame( do.call( rbind, Map(function(y){y$x <- estimation_func(y$x, y$pw, y$add); y}, split(df, df$ID)) ), row.names = NULL ) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.