2

I have a dataframe with gene expression data by lane (column). What I would like to do is write a loop that takes the sum of each row but progressively adds in another column each time. So each time I loop through I add another column to my dataframe that contains the sums of each row plus another column to the end of the dataframe. In the example below I did this using the apply() function by hand but this is very inefficient and not feasible for a large data set. I messed around with the cumsum() function but couldn't seem to get it to work for this. Very possible I missed something obvious but any guidance would be great!

#Example dataframe

c1 <- c('G1', 'G2', 'G3') c2 <- c(5, 3, 1) c3 <- c(3, 7, 1) c4 <- c(6, 3, 4) c5 <- c(6, 4, 3) df <- data.frame(c1, c2, c3, c4, c5) 
#Cal cumulative sums sum.2.3 <- apply(df[,2:3],1,sum) sum.2.4 <- apply(df[,2:4],1,sum) sum.2.5 <- apply(df[,2:5],1,sum) df <- cbind(df, sum.2.3, sum.2.4, sum.2.5) 

5 Answers 5

2

If the problem is the loop, you use apply inside it.

Code

start_col <- 2 end_col <- ncol(df) for(i in (start_col+1):end_col){ var_name <- paste("sum",start_col,i,sep = ".") df[,var_name] <- apply(df[,start_col:i],1,sum) } 

Output

 c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5 1 G1 5 3 6 6 8 14 20 2 G2 3 7 3 4 10 13 17 3 G3 1 1 4 3 2 6 9 
Sign up to request clarification or add additional context in comments.

1 Comment

thanks this worked perfectly! Just adjusted it to the larger dataset and ran without a hitch!
2

You can use Reduce()

Reduce(`+`, df[-1], accumulate = TRUE)[-1] [[1]] [1] 8 10 2 [[2]] [1] 14 13 6 [[3]] [1] 20 17 9 

Assign into the data frame:

df[paste0("sum.2.", 3:5)] <- Reduce(`+`, df[-1], accumulate = TRUE)[-1] 

Gives:

 c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5 1 G1 5 3 6 6 8 14 20 2 G2 3 7 3 4 10 13 17 3 G3 1 1 4 3 2 6 9 

Comments

1

No loop needed.

df <- data.frame( c1 = c('G1', 'G2', 'G3'), c2 = c(5, 3, 1), c3 = c(3, 7, 1), c4 = c(6, 3, 4), c5 = c(6, 4, 3)) cbind(df, setNames(as.data.frame(t(apply(df[,-1], 1, cumsum))[,-1]), paste0("sum.2.", 3:5))) #> c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5 #> 1 G1 5 3 6 6 8 14 20 #> 2 G2 3 7 3 4 10 13 17 #> 3 G3 1 1 4 3 2 6 9 

1 Comment

An alternative with pipes: cbind(df, df[,-1] |> apply(1, cumsum) |> tail(-1) |> t() |> data.frame() |> setNames(paste0("sum.2.", 3:ncol(df))))
1

Using rowCumsums from matrixStats

library(matrixStats) df[paste0("sum.2.", 3:5)] <- rowCumsums(as.matrix(df[2:5]))[,-1] 

-output

> df c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5 1 G1 5 3 6 6 8 14 20 2 G2 3 7 3 4 10 13 17 3 G3 1 1 4 3 2 6 9 

Comments

-1

You can use both the mutate function from the dplyr package and the rowSums base function.

library(dplyr) c1 <- c('G1', 'G2', 'G3') c2 <- c(5, 3, 1) c3 <- c(3, 7, 1) c4 <- c(6, 3, 4) c5 <- c(6, 4, 3) df <- data.frame(c1, c2, c3, c4, c5) df <- df %>% dplyr::mutate(sum.2.3 = rowSums(across(c2:c3)), sum.2.4 = rowSums(across(c2:c4)), sum.2.5 = rowSums(across(c2:c5))) 

Result

 c1 c2 c3 c4 c5 sum.2.3 sum.2.4 sum.2.5 1 G1 5 3 6 6 8 14 20 2 G2 3 7 3 4 10 13 17 3 G3 1 1 4 3 2 6 9 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.