2

I have the following data-frame

Year Category TotalSales AverageCount 1 2013 Beverages 102074.29 22190.06 2 2013 Condiments 55277.56 14173.73 3 2013 Confections 36415.75 12138.58 4 2013 Dairy Products 30337.39 24400.00 5 2013 Seafood 53019.98 27905.25 6 2014 Beverages 81338.06 35400.00 7 2014 Condiments 55948.82 19981.72 8 2014 Confections 44478.36 24710.00 9 2014 Dairy Products 84412.36 32466.00 10 2014 Seafood 65544.19 14565.37 

I calculated the cumulative sum for TotalSales, grouped by Year by the following method

dat <-within(dat, { RunningTotal <- ave(dat$TotalSales, dat$Year, FUN = cumsum) }) 

and the output is this,

 Year Category TotalSales AverageCount RunningTotal 1 2013 Beverages 102074.29 22190.06 102074.29 2 2013 Condiments 55277.56 14173.73 157351.85 3 2013 Confections 36415.75 12138.58 193767.60 4 2013 Dairy Products 30337.39 24400.00 224104.99 5 2013 Seafood 53019.98 27905.25 277124.97 6 2014 Beverages 81338.06 35400.00 81338.06 7 2014 Condiments 55948.82 19981.72 137286.88 8 2014 Confections 44478.36 24710.00 181765.24 9 2014 Dairy Products 84412.36 32466.00 266177.60 10 2014 Seafood 65544.19 14565.37 331721.79 

How do I calculate the group-wise Ratio of the elements in the row RunningTotal (Ratio between RunningTotal[i+1] and RunningTotal[i])?

I've tried using mutate from dplyr

require(dplyr) dat<-mutate(dat, Ratio = lag(RunningTotal)/RunningTotal) 

and I get an incorrect output ( notice NAs)

 Year Category TotalSales AverageCount RunningTotal Ratio 1 2013 Beverages 102074.29 22190.06 102074.29 NA 2 2013 Condiments 55277.56 14173.73 157351.85 0.6487009 3 2013 Confections 36415.75 12138.58 193767.60 0.8120648 4 2013 Dairy Products 30337.39 24400.00 224104.99 0.8646287 5 2013 Seafood 53019.98 27905.25 277124.97 0.8086784 6 2014 Beverages 81338.06 35400.00 81338.06 NA 7 2014 Condiments 55948.82 19981.72 137286.88 0.5924678 8 2014 Confections 44478.36 24710.00 181765.24 0.7552978 9 2014 Dairy Products 84412.36 32466.00 266177.60 0.6828720 10 2014 Seafood 65544.19 14565.37 331721.79 0.8024122 

How do I get the desired output as shown below?

Year Category TotalSales AverageCount RunningTotal Ratio 2013 Beverages 102074.29 22190.06 102074.29 1.5415424393 2013 Condiments 55277.56 14173.73 157351.85 1.2314288011 2013 Confections 36415.75 12138.58 193767.6 1.1565658552 2013 Dairy Products 30337.39 24400 224104.99 1.2365854504 2013 Seafood 53019.98 27905.25 277124.97 0.2935067887 2014 Beverages 81338.06 35400 81338.06 1.6878553533 2014 Condiments 55948.82 19981.72 137286.88 1.3239811408 2014 Confections 44478.36 24710 181765.24 1.4644032049 2014 Dairy Products 84412.36 32466 266177.6 1.2462423209 2014 Seafood 65544.19 14565.37 331721.79 0 

Sample data :

dat <- structure(list(Year = c(2013L, 2013L, 2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 2014L, 2014L), Category = structure(c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("Beverages", "Condiments", "Confections", "Dairy Products", "Seafood"), class = "factor"), TotalSales = c(102074.29, 55277.56, 36415.75, 30337.39, 53019.98, 81338.06, 55948.82, 44478.36, 84412.36, 65544.19), AverageCount = c(22190.06, 14173.73, 12138.58, 24400, 27905.25, 35400, 19981.72, 24710, 32466, 14565.37)), .Names = c("Year", "Category", "TotalSales", "AverageCount"), class = "data.frame", row.names = c(NA, -10L ) 
4
  • 2
    You've got the correct result just reversed. In other words, just reverse you last line as in mutate(dat, Ratio = RunningTotal/lag(RunningTotal)) Commented May 7, 2015 at 9:12
  • Well, I'm getting NAs in between.. dat$Ratio gives NA 1.541542 1.231429 1.156566 1.236585 NA 1.687855 1.323981 1.464403 1.246242. How do I avoid that? And if I write a function to divide two numbers, please let me know how do I pass it appropriately, using R's aggregate functions. Thanks in advance. Commented May 7, 2015 at 11:18
  • You should get an NA because you are using lag, but On your data I've got it only once and I had all the values like in your desired output. Commented May 7, 2015 at 11:28
  • Well.. if I write a function called divide(x,y), how do I call it using the within() function? I get an error saying object 'FUN' of mode 'function' was not found Commented May 7, 2015 at 11:33

1 Answer 1

1

The dplyr way of doing your first operation is:

dat <- dat %>% group_by(Year) %>% mutate(RunningTotal = cumsum(TotalSales)) %>% ungroup 

Then to add the ratios, use

dat %>% mutate(Ratio = c(RunningTotal[-1] / RunningTotal[-n()], 0)) 

Though I'd be tempted to make that last value NA, not 0. The ratio for 2013 Seafood (0.2935067887) doesn't make any sense either. To get rid of that, you want to not perform the ungrouping. So something like this:

dat %>% group_by(Year) %>% mutate( RunningTotal = cumsum(TotalSales), Ratio = c(RunningTotal[-1] / RunningTotal[-n()], NA) ) 
Sign up to request clarification or add additional context in comments.

3 Comments

Or just slightly modifying OPs code Ratio = c((RunningTotal/lag(RunningTotal))[-1L], NA)
@Richie yeah. pipe-lining! Helps in many ways! Cheers, Sir.. !
@DavidArenburg very well, its effective too. Thanks much, Sir!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.