4

I have played around with dplyr a little and really like it. I am missing something though. In plyr, I was able to pass a functions to ddplyand reuse it.

library('dplyr') library('plyr') fn = function(df) { summarise(df, count = length(id)) } ddply(DF1,'group', fn) ddply(DF2,'group', fn) 

So I can apply a long list of recordings to multiple datasets without replicating all the arguments to summarise. In dplyr, however, I have to do this

dplyr::summarise(group_by(DF1,group), count = length(id)) dplyr::summarise(group_by(DF2,group), count = length(id)) 

So the arguments to summarise have to be repeated each time. A list of arguments with list('.data'=DF1,'count'=length(id)) and do.call does not work either because length(id) is evaluated when I define the argument list. Are there any solutions for this?

1
  • Why do you have many small data frames instead of one big data frame? Commented Jan 20, 2014 at 20:44

2 Answers 2

8

I like @RomanLustrik answer, so here's a 100% dplyr approach to his answer.

do(mylist, function(df) df %.% group_by(b) %.% summarise(count = n())) ## [[1]] ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5 ## [[2]] ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5 

In this answer I just tried to replicate Roman's approach but you can reuse your function (fn) :

fn <- function(df) { summarise(df, count = n()) } group_by(df1, b) %.% fn() ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5 group_by(df2, b) %.% fn() ## Source: local data frame [2 x 2] ## b count ## 1 b 5 ## 2 a 5 

And you can even wrap it like this

do(list(df1, df2), function(df) group_by(df, b) %.% fn()) 
Sign up to request clarification or add additional context in comments.

3 Comments

very nice solution. I guess it's not possible to avoid copying the data.frames to a list and getting lists back? The ddply approach pre-saves the recodings so that they can be applied multiple times.
@user2503795 I edited my answer, check if is the expected result you were looking for.
This is exactly what I'd recommend. Thanks for saving me some typing :)
3

Is this what you're after?

df1 <- df2 <- data.frame(a = runif(10), b = rep(c("a", "b"), each = 5)) library(dplyr) mylist <- list(df1, df2) lapply(mylist, FUN = function(x) { dplyr::summarise(group_by(x, b), count = length(b)) }) [[1]] Source: local data frame [2 x 2] b count 1 a 5 2 b 5 [[2]] Source: local data frame [2 x 2] b count 1 a 5 2 b 5 

1 Comment

This is analogous to the issue of when anonymous/lambda functions get applied, e.g. in Python.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.