I am trying to sum multiple variables over multiple subjects in a data set. I know how to do this using the plyr package; however, because of the length of the data set, number of variables, and number of different rolling sums I am trying to do (2-day, 3-day, 4-day, etc). I was wondering if someone had a more time efficient manner to complete this task in dplyr.
My data is similar to this:
Subjects <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3) Day <- c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5) variable.A <- rnorm(n = Day, mean = 20, sd = 5) variable.B <- rnorm(n = Day, mean = 50, sd = 15) variable.C <- rnorm(n = Day, mean = 100, sd = 33) dat <- data.frame(Subjects, Day, variable.A, variable.B, variable.C) dat Subjects Day variable.A variable.B variable.C 1 1 1 20.17676 72.44022 56.69915 2 1 2 14.11462 46.28473 117.00864 3 1 3 15.30440 72.43752 93.17489 4 1 4 13.72422 66.76744 101.26422 5 1 5 21.97695 69.50480 102.61979 6 2 1 14.45742 32.69106 82.37268 7 2 2 33.37783 65.06782 97.17744 8 2 3 13.57833 26.37183 89.38218 9 2 4 23.01717 55.83446 147.85362 10 2 5 14.06008 32.00396 48.73060 11 3 1 14.57199 60.29746 87.07977 12 3 2 15.77413 77.04517 132.17910 13 3 3 30.05661 30.62220 171.35998 14 3 4 24.65348 53.96450 74.99875 15 3 5 26.93699 57.06393 36.81901 An example of the code I tried was this:
library(plyr) library(RcppRoll) summarize <- ddply(dat, "Subjects", mutate, Two.Day.Roll.A = roll_sum(variable.A, 2, align = "right", fill = NA), Two.Day.Roll.B = roll_sum(variable.B, 2, align = "right", fill = NA), Two.Day.Roll.C = roll_sum(variable.C, 2, align = "right", fill = NA)) Subjects Day variable.A variable.B variable.C Two.Day.Roll.A Two.Day.Roll.B Two.Day.Roll.C 1 1 1 15.324798 24.83074 137.48853 NA NA NA 2 1 2 12.112943 58.86094 86.87454 27.43774 83.69168 224.3631 3 1 3 16.179328 57.95450 68.71333 28.29227 116.81544 155.5879 4 1 4 15.319750 38.13721 79.43194 31.49908 96.09171 148.1453 5 1 5 21.791452 61.99368 134.30205 37.11120 100.13089 213.7340 6 2 1 10.937461 63.83164 95.04865 NA NA NA 7 2 2 14.642376 79.12452 107.13699 25.57984 142.95616 202.1856 8 2 3 17.519905 52.75490 100.62811 32.16228 131.87942 207.7651 9 2 4 23.190371 37.56950 179.72763 40.71028 90.32440 280.3557 10 2 5 13.729350 46.95616 72.14179 36.91972 84.52566 251.8694 11 3 1 9.609171 74.51140 130.90005 NA NA NA 12 3 2 27.542897 14.36222 133.87630 37.15207 88.87363 264.7763 13 3 3 18.750015 60.46183 130.44314 46.29291 74.82405 264.3194 14 3 4 17.461882 52.65797 176.30620 36.21190 113.11979 306.7493 15 3 5 31.244564 62.41614 78.82916 48.70645 115.07411 255.1354 This works well enough but, as I said the original data has a lot more columns and I want to continue and do 3 day sums, 4 day sums, etc over all of those variables. Also, my original data has some NAs in it so perhaps there is a way to handle this?
I have played around with trying to use the mutate_each() function with the dplyr package but can't seem to get the syntax right.
Thank you.