0

I am working on R with a dataset that looks like this:

Screen shot of dataset

test=data.frame("1991" = c(1,5,3), "1992" = c(4,3,3), "1993" = c(10,5,3), "1994" = c(1,1,1), "1995" = c(2,2,6)) test=plyr::rename(test, c("X1991"="1991", "X1992"="1992", "X1993"="1993", "X1994"="1994", "X1995"="1995")) 

What I want to do is that I want to create variables called Pre1991, Pre1992, Pre1993, ... and these variables would store the cumulated values up to that year, e.g.

Pre1991 = test$1991 Pre1992 = test$1991 + test$1992 Pre1993 = test$1991 + test$1992 + test$1993 

so on.

My real dataset has variables from year 1900-2017 so I can't do this manually. I want to write a for loop but it didnt work.

for (i in 1900:2017){ x = paste0("Pre",i) df[[x]] = rowSums(df[,(colnames(df)<=i)]) } 

Can someone please help to review my code/ suggest other ways to do it? Thanks!

Edit 1:

Thanks so much! And I'm wondering if there's a way that I can use cumsum function in a reverse direction? For example, if I am interested in what happened after that particular year:

Post1991 = test$1992 + test$1993 + test$1994 + test$1995 + ... Post1992 = test$1993 + test$1994 + test$1995 + ... Post1993 = test$1994 + test$1995 + ... 
2
  • Thanks everyone!!!! Commented Apr 21, 2018 at 5:04
  • 1
    If any of the answers helped you it would be fitting to accept an answer or upvote them. Commented Apr 23, 2018 at 0:57

3 Answers 3

2

This is a little inefficient in that it is converting from a data.frame to a matrix and back, but ...

as.data.frame(t(apply(as.matrix(test), 1, cumsum))) # 1991 1992 1993 1994 1995 # 1 1 5 15 16 18 # 2 5 8 13 14 16 # 3 3 6 9 10 16 

If your data has other columns that are not year-based, such as

test$quux <- LETTERS[3:5] test # 1991 1992 1993 1994 1995 quux # 1 1 4 10 1 2 C # 2 5 3 5 1 2 D # 3 3 3 3 1 6 E 

then subset on both sides:

test[1:5] <- as.data.frame(t(apply(as.matrix(test[1:5]), 1, cumsum))) test # 1991 1992 1993 1994 1995 quux # 1 1 5 15 16 18 C # 2 5 8 13 14 16 D # 3 3 6 9 10 16 E 

EDIT

In reverse, just use repeated rev:

as.data.frame(t(apply(as.matrix(test), 1, function(a) rev(cumsum(rev(a)))-a))) # 1991 1992 1993 1994 1995 # 1 17 13 3 2 0 # 2 11 8 3 2 0 # 3 13 10 7 6 0 
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks very much! Is there a way I can do cumsum in a reverse direction? Please refer to my updated question above. Thanks!
OP's reverse definition is excluding the current year.
Ok ... odd, but fixed. (This is really not the best format for this data ...)
2

Using tidyverse we can gather and calculate before then spreading again. For this to work data will need to be arranged.

library(tidyverse) test <- data.frame("1991" = c(1, 5, 3), "1992" = c(4, 3, 3), "1993" = c(10, 5, 3), "1994" = c(1, 1, 1), "1995" = c(2, 2, 6)) test <- plyr::rename(test, c("X1991" = "1991", "X1992" = "1992", "X1993" = "1993", "X1994" = "1994", "X1995" = "1995")) 

Forwards

test %>% mutate(id = 1:nrow(.)) %>% # adding an ID to identify groups gather(year, value, -id) %>% # wide to long format arrange(id, year) %>% group_by(id) %>% mutate(value = cumsum(value)) %>% ungroup() %>% spread(year, value) %>% # long to wide format select(-id) %>% setNames(paste0("pre", names(.))) # add prefix to columns ## A tibble: 3 x 5 # pre1991 pre1992 pre1993 pre1994 pre1995 # <dbl> <dbl> <dbl> <dbl> <dbl> # 1 1. 5. 15. 16. 18. # 2 5. 8. 13. 14. 16. # 3 3. 6. 9. 10. 16. 

Reverse direction

As your definition specifies its not strictly the reverse order, its the reverse order excluding itself which would be the cumulative lagged sum.

test %>% mutate(id = 1:nrow(.)) %>% gather(year, value, -id) %>% arrange(id, desc(year)) %>% # using desc() to reverse sorting group_by(id) %>% mutate(value = cumsum(lag(value, default = 0))) %>% # lag cumsum ungroup() %>% spread(year, value) %>% select(-id) %>% setNames(paste0("post", names(.))) ## A tibble: 3 x 5 # post1991 post1992 post1993 post1994 post1995 # <dbl> <dbl> <dbl> <dbl> <dbl> # 1 17. 13. 3. 2. 0. # 2 11. 8. 3. 2. 0. # 3 13. 10. 7. 6. 0. 

2 Comments

Thanks very much! Is there a way I can do cumsum in a reverse direction? Please refer to my updated question above. Thanks!
@DataScienceBeginner check my revised answer
1

We can use rowCumsums from matrixStats

library(matrixStats) test[] <- rowCumsums(as.matrix(test)) test # 1991 1992 1993 1994 1995 #1 1 5 15 16 18 #2 5 8 13 14 16 #3 3 6 9 10 16 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.