dplyr and reusable argument lists

Question

I have played around with dplyr a little and really like it. I am missing something though. In plyr, I was able to pass a functions to ddplyand reuse it.

library('dplyr') library('plyr') fn = function(df) { summarise(df, count = length(id)) } ddply(DF1,'group', fn) ddply(DF2,'group', fn)

So I can apply a long list of recordings to multiple datasets without replicating all the arguments to summarise. In dplyr, however, I have to do this

dplyr::summarise(group_by(DF1,group), count = length(id)) dplyr::summarise(group_by(DF2,group), count = length(id))

So the arguments to summarise have to be repeated each time. A list of arguments with list('.data'=DF1,'count'=length(id)) and do.call does not work either because length(id) is evaluated when I define the argument list. Are there any solutions for this?

Why do you have many small data frames instead of one big data frame? — hadley
– hadley, Commented Jan 20, 2014 at 20:44

dickoa · Accepted Answer · 2014-01-20 19:08:22Z

I like @RomanLustrik answer, so here's a 100% dplyr approach to his answer.

do(mylist, function(df) df %.% group_by(b) %.% summarise(count = n())) ## [[1]] ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5 ## [[2]] ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5

In this answer I just tried to replicate Roman's approach but you can reuse your function (fn) :

fn <- function(df) { summarise(df, count = n()) } group_by(df1, b) %.% fn() ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5 group_by(df2, b) %.% fn() ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5

And you can even wrap it like this

do(list(df1, df2), function(df) group_by(df, b) %.% fn())

very nice solution. I guess it's not possible to avoid copying the data.frames to a list and getting lists back? The ddply approach pre-saves the recodings so that they can be applied multiple times.
@user2503795 I edited my answer, check if is the expected result you were looking for.
This is exactly what I'd recommend. Thanks for saving me some typing :)

Roman Luštrik · Accepted Answer · 2014-01-19 14:34:28Z

Is this what you're after?

df1 <- df2 <- data.frame(a = runif(10), b = rep(c("a", "b"), each = 5)) library(dplyr) mylist <- list(df1, df2) lapply(mylist, FUN = function(x) { dplyr::summarise(group_by(x, b), count = length(b)) }) [[1]] Source: local data frame [2 x 2] b count 1 a 5 2 b 5 [[2]] Source: local data frame [2 x 2] b count 1 a 5 2 b 5

This is analogous to the issue of when anonymous/lambda functions get applied, e.g. in Python.

Collectives™ on Stack Overflow

dplyr and reusable argument lists

2 Answers 2

3 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Related