Rolling sum over multiple columns in r

Question

I am working on R with a dataset that looks like this:

test=data.frame("1991" = c(1,5,3), "1992" = c(4,3,3), "1993" = c(10,5,3), "1994" = c(1,1,1), "1995" = c(2,2,6)) test=plyr::rename(test, c("X1991"="1991", "X1992"="1992", "X1993"="1993", "X1994"="1994", "X1995"="1995"))

What I want to do is that I want to create variables called Pre1991, Pre1992, Pre1993, ... and these variables would store the cumulated values up to that year, e.g.

Pre1991 = test$1991 Pre1992 = test$1991 + test$1992 Pre1993 = test$1991 + test$1992 + test$1993

so on.

My real dataset has variables from year 1900-2017 so I can't do this manually. I want to write a for loop but it didnt work.

for (i in 1900:2017){ x = paste0("Pre",i) df[[x]] = rowSums(df[,(colnames(df)<=i)]) }

Can someone please help to review my code/ suggest other ways to do it? Thanks!

Edit 1:

Thanks so much! And I'm wondering if there's a way that I can use cumsum function in a reverse direction? For example, if I am interested in what happened after that particular year:

Post1991 = test$1992 + test$1993 + test$1994 + test$1995 + ... Post1992 = test$1993 + test$1994 + test$1995 + ... Post1993 = test$1994 + test$1995 + ...

If any of the answers helped you it would be fitting to accept an answer or upvote them. — zacdav
– zacdav, Commented Apr 23, 2018 at 0:57

r2evans · Accepted Answer · 2018-04-21 02:41:45Z

This is a little inefficient in that it is converting from a data.frame to a matrix and back, but ...

as.data.frame(t(apply(as.matrix(test), 1, cumsum))) # 1991 1992 1993 1994 1995 # 1 1 5 15 16 18 # 2 5 8 13 14 16 # 3 3 6 9 10 16

If your data has other columns that are not year-based, such as

test$quux <- LETTERS[3:5] test # 1991 1992 1993 1994 1995 quux # 1 1 4 10 1 2 C # 2 5 3 5 1 2 D # 3 3 3 3 1 6 E

then subset on both sides:

test[1:5] <- as.data.frame(t(apply(as.matrix(test[1:5]), 1, cumsum))) test # 1991 1992 1993 1994 1995 quux # 1 1 5 15 16 18 C # 2 5 8 13 14 16 D # 3 3 6 9 10 16 E

EDIT

In reverse, just use repeated rev:

as.data.frame(t(apply(as.matrix(test), 1, function(a) rev(cumsum(rev(a)))-a))) # 1991 1992 1993 1994 1995 # 1 17 13 3 2 0 # 2 11 8 3 2 0 # 3 13 10 7 6 0

Thanks very much! Is there a way I can do cumsum in a reverse direction? Please refer to my updated question above. Thanks!
Ok ... odd, but fixed. (This is really not the best format for this data ...)

zacdav · Accepted Answer · 2018-04-21 02:43:19Z

Using tidyverse we can gather and calculate before then spreading again. For this to work data will need to be arranged.

library(tidyverse) test <- data.frame("1991" = c(1, 5, 3), "1992" = c(4, 3, 3), "1993" = c(10, 5, 3), "1994" = c(1, 1, 1), "1995" = c(2, 2, 6)) test <- plyr::rename(test, c("X1991" = "1991", "X1992" = "1992", "X1993" = "1993", "X1994" = "1994", "X1995" = "1995"))

Forwards

test %>% mutate(id = 1:nrow(.)) %>% # adding an ID to identify groups gather(year, value, -id) %>% # wide to long format arrange(id, year) %>% group_by(id) %>% mutate(value = cumsum(value)) %>% ungroup() %>% spread(year, value) %>% # long to wide format select(-id) %>% setNames(paste0("pre", names(.))) # add prefix to columns ## A tibble: 3 x 5 # pre1991 pre1992 pre1993 pre1994 pre1995 # <dbl> <dbl> <dbl> <dbl> <dbl> # 1 1. 5. 15. 16. 18. # 2 5. 8. 13. 14. 16. # 3 3. 6. 9. 10. 16.

Reverse direction

As your definition specifies its not strictly the reverse order, its the reverse order excluding itself which would be the cumulative lagged sum.

test %>% mutate(id = 1:nrow(.)) %>% gather(year, value, -id) %>% arrange(id, desc(year)) %>% # using desc() to reverse sorting group_by(id) %>% mutate(value = cumsum(lag(value, default = 0))) %>% # lag cumsum ungroup() %>% spread(year, value) %>% select(-id) %>% setNames(paste0("post", names(.))) ## A tibble: 3 x 5 # post1991 post1992 post1993 post1994 post1995 # <dbl> <dbl> <dbl> <dbl> <dbl> # 1 17. 13. 3. 2. 0. # 2 11. 8. 3. 2. 0. # 3 13. 10. 7. 6. 0.

Thanks very much! Is there a way I can do cumsum in a reverse direction? Please refer to my updated question above. Thanks!

akrun · Accepted Answer · 2018-04-21 03:02:01Z

We can use rowCumsums from matrixStats

library(matrixStats) test[] <- rowCumsums(as.matrix(test)) test # 1991 1992 1993 1994 1995 #1 1 5 15 16 18 #2 5 8 13 14 16 #3 3 6 9 10 16

Collectives™ on Stack Overflow

Rolling sum over multiple columns in r

3 Answers 3

3 Comments

Forwards

Reverse direction

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Forwards

Reverse direction

2 Comments

Comments

Related