1

I am trying to sum multiple variables over multiple subjects in a data set. I know how to do this using the plyr package; however, because of the length of the data set, number of variables, and number of different rolling sums I am trying to do (2-day, 3-day, 4-day, etc). I was wondering if someone had a more time efficient manner to complete this task in dplyr.

My data is similar to this:

Subjects <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3) Day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5) variable.A <- rnorm(n = Day, mean = 20, sd = 5) variable.B <- rnorm(n = Day, mean = 50, sd = 15) variable.C <- rnorm(n = Day, mean = 100, sd = 33) dat <- data.frame(Subjects, Day, variable.A, variable.B, variable.C) dat Subjects Day variable.A variable.B variable.C 1 1 1 20.17676 72.44022 56.69915 2 1 2 14.11462 46.28473 117.00864 3 1 3 15.30440 72.43752 93.17489 4 1 4 13.72422 66.76744 101.26422 5 1 5 21.97695 69.50480 102.61979 6 2 1 14.45742 32.69106 82.37268 7 2 2 33.37783 65.06782 97.17744 8 2 3 13.57833 26.37183 89.38218 9 2 4 23.01717 55.83446 147.85362 10 2 5 14.06008 32.00396 48.73060 11 3 1 14.57199 60.29746 87.07977 12 3 2 15.77413 77.04517 132.17910 13 3 3 30.05661 30.62220 171.35998 14 3 4 24.65348 53.96450 74.99875 15 3 5 26.93699 57.06393 36.81901 

An example of the code I tried was this:

library(plyr) library(RcppRoll) summarize <- ddply(dat, "Subjects", mutate, Two.Day.Roll.A = roll_sum(variable.A, 2, align = "right", fill = NA), Two.Day.Roll.B = roll_sum(variable.B, 2, align = "right", fill = NA), Two.Day.Roll.C = roll_sum(variable.C, 2, align = "right", fill = NA)) Subjects Day variable.A variable.B variable.C Two.Day.Roll.A Two.Day.Roll.B Two.Day.Roll.C 1 1 1 15.324798 24.83074 137.48853 NA NA NA 2 1 2 12.112943 58.86094 86.87454 27.43774 83.69168 224.3631 3 1 3 16.179328 57.95450 68.71333 28.29227 116.81544 155.5879 4 1 4 15.319750 38.13721 79.43194 31.49908 96.09171 148.1453 5 1 5 21.791452 61.99368 134.30205 37.11120 100.13089 213.7340 6 2 1 10.937461 63.83164 95.04865 NA NA NA 7 2 2 14.642376 79.12452 107.13699 25.57984 142.95616 202.1856 8 2 3 17.519905 52.75490 100.62811 32.16228 131.87942 207.7651 9 2 4 23.190371 37.56950 179.72763 40.71028 90.32440 280.3557 10 2 5 13.729350 46.95616 72.14179 36.91972 84.52566 251.8694 11 3 1 9.609171 74.51140 130.90005 NA NA NA 12 3 2 27.542897 14.36222 133.87630 37.15207 88.87363 264.7763 13 3 3 18.750015 60.46183 130.44314 46.29291 74.82405 264.3194 14 3 4 17.461882 52.65797 176.30620 36.21190 113.11979 306.7493 15 3 5 31.244564 62.41614 78.82916 48.70645 115.07411 255.1354 

This works well enough but, as I said the original data has a lot more columns and I want to continue and do 3 day sums, 4 day sums, etc over all of those variables. Also, my original data has some NAs in it so perhaps there is a way to handle this?

I have played around with trying to use the mutate_each() function with the dplyr package but can't seem to get the syntax right.

Thank you.

2
  • Thanks, which option should I select to wrap it in and I'll fix it? Commented Oct 29, 2015 at 14:38
  • 1
    Got it. Thanks. Will fix it. Commented Oct 29, 2015 at 14:40

1 Answer 1

2

Here's the dplyr version:

library(dplyr) library(RcppRoll) dat %>% group_by(Subjects) %>% mutate_each(funs(roll_sum(., 2, align = "right", fill=NA)), -Subjects, -Day) 
Sign up to request clarification or add additional context in comments.

6 Comments

Two more minor things: you don't need -Subjects and this overwrites the old cols, in contrast with the plyr result above.
Looks like the devs are aware of the latter problem but offer no workaround. github.com/hadley/dplyr/issues/712 The best I can think of is dat %>% group_by(Subjects) %>% mutate_each(funs("." = "(", roll = roll_sum(.,2,align="right",fill=NA)), -Day)
That is great. Thanks. If I have NAs in the original data, would I just put the argument na.rm = T within the roll_sum() function?
@Frank - yeah there is some naming weirdness - if OP wants to do it for 2,3,4 etc days, then they can name those.
@user3585829 Unfortunately, it is just a hack until the dplyr developers fix the bug. "." is the name and "(" is the function. The "(" function just returns the object, like wrapping in (). I think this may be the best reference, though I haven't read it all: stackoverflow.com/a/27027681/1191259
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.