I have the following data-frame
Year Category TotalSales AverageCount 1 2013 Beverages 102074.29 22190.06 2 2013 Condiments 55277.56 14173.73 3 2013 Confections 36415.75 12138.58 4 2013 Dairy Products 30337.39 24400.00 5 2013 Seafood 53019.98 27905.25 6 2014 Beverages 81338.06 35400.00 7 2014 Condiments 55948.82 19981.72 8 2014 Confections 44478.36 24710.00 9 2014 Dairy Products 84412.36 32466.00 10 2014 Seafood 65544.19 14565.37 I calculated the cumulative sum for TotalSales, grouped by Year by the following method
dat <-within(dat, { RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum) }) and the output is this,
Year Category TotalSales AverageCount RunningTotal 1 2013 Beverages 102074.29 22190.06 102074.29 2 2013 Condiments 55277.56 14173.73 157351.85 3 2013 Confections 36415.75 12138.58 193767.60 4 2013 Dairy Products 30337.39 24400.00 224104.99 5 2013 Seafood 53019.98 27905.25 277124.97 6 2014 Beverages 81338.06 35400.00 81338.06 7 2014 Condiments 55948.82 19981.72 137286.88 8 2014 Confections 44478.36 24710.00 181765.24 9 2014 Dairy Products 84412.36 32466.00 266177.60 10 2014 Seafood 65544.19 14565.37 331721.79 How do I calculate the group-wise Ratio of the elements in the row RunningTotal (Ratio between RunningTotal[i+1] and RunningTotal[i])?
I've tried using mutate from dplyr
require(dplyr) dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal) and I get an incorrect output ( notice NAs)
Year Category TotalSales AverageCount RunningTotal Ratio 1 2013 Beverages 102074.29 22190.06 102074.29 NA 2 2013 Condiments 55277.56 14173.73 157351.85 0.6487009 3 2013 Confections 36415.75 12138.58 193767.60 0.8120648 4 2013 Dairy Products 30337.39 24400.00 224104.99 0.8646287 5 2013 Seafood 53019.98 27905.25 277124.97 0.8086784 6 2014 Beverages 81338.06 35400.00 81338.06 NA 7 2014 Condiments 55948.82 19981.72 137286.88 0.5924678 8 2014 Confections 44478.36 24710.00 181765.24 0.7552978 9 2014 Dairy Products 84412.36 32466.00 266177.60 0.6828720 10 2014 Seafood 65544.19 14565.37 331721.79 0.8024122 How do I get the desired output as shown below?
Year Category TotalSales AverageCount RunningTotal Ratio 2013 Beverages 102074.29 22190.06 102074.29 1.5415424393 2013 Condiments 55277.56 14173.73 157351.85 1.2314288011 2013 Confections 36415.75 12138.58 193767.6 1.1565658552 2013 Dairy Products 30337.39 24400 224104.99 1.2365854504 2013 Seafood 53019.98 27905.25 277124.97 0.2935067887 2014 Beverages 81338.06 35400 81338.06 1.6878553533 2014 Condiments 55948.82 19981.72 137286.88 1.3239811408 2014 Confections 44478.36 24710 181765.24 1.4644032049 2014 Dairy Products 84412.36 32466 266177.6 1.2462423209 2014 Seafood 65544.19 14565.37 331721.79 0 Sample data :
dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments", "Confections", "Dairy Products", "Seafood"), class = "factor"), TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98, 81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06, 14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710, 32466, 14565.37)), .Names = c("Year", "Category", "TotalSales", "AverageCount"), class = "data.frame", row.names = c(NA, -10L )
mutate(dat, Ratio = RunningTotal/lag(RunningTotal))NAsin between..dat$RatiogivesNA 1.541542 1.231429 1.156566 1.236585 NA 1.687855 1.323981 1.464403 1.246242. How do I avoid that? And if I write a function to divide two numbers, please let me know how do I pass it appropriately, using R's aggregate functions. Thanks in advance.NAbecause you are usinglag, but On your data I've got it only once and I had all the values like in your desired output.divide(x,y), how do I call it using thewithin()function? I get an error sayingobject 'FUN' of mode 'function' was not found