22

I have a list containing data frames as its elements in R.

Example:

df1 <- data.frame("names"=c("John","Sam","Dave"),"age"=c(21,22,25)) df2 <- data.frame("names"=c("John","Sam"),"score"=c(22,25)) df3 <- data.frame("names"=c("John","Sam","Dave"),"country"=c("US","SA","NZ")) mylist <- list(df1,df2,df3) 

Is it possible to merge all the elements of mylist together without using a loop?

My desired output for this example is:

 names age score country 1 John 21 22 US 2 Sam 22 25 SA 

The list in this example has only three elements; however, I am looking for a solution that can handle an arbitrary number of elements.

0

4 Answers 4

33

You can use Reduce, one liner solution:

Reduce(merge,mylist) names age score country 1 John 21 22 US 2 Sam 22 25 SA 
Sign up to request clarification or add additional context in comments.

Comments

8

Quick and dirty example:

merge(merge(df1, df2),df3) 

EDIT - Very similar question here:Simultaneously merge multiple data.frames in a list

solution:

merged.data.frame = Reduce(function(...) merge(..., all=F), my.list) 

Disclaimer - All I changed from @Charles answer was to make merge(..., all=F) rather than T - this way it gives your desired output.

2 Comments

Thanks @alexwhan. I should have been more specific. I need a solution for a list with an arbitrary number of elements. My input list may have a different number of elements each time instead of the three in this example.
Yes, that's what I wondered
6

Just to show it could be done another way...

mymerge <- function(mylist) { names(mylist) <- sapply(mylist, function(x) names(x)[2]) ns <- unique(unlist(lapply(mylist, function(x) levels(x$names)))) as.data.frame(c(list(names=ns), lapply(mylist, function(x) {x[match(ns, x$names),2]}))) } > mymerge(mylist) names age score country 1 Dave 25 NA NZ 2 John 21 22 US 3 Sam 22 25 SA 

One could easily adapt to remove rows with missing values, or perhaps just remove afterwards with complete.cases.

To show that it's faster, we'll make up a bigger data set; 100 variables and 25 names.

set.seed(5) vs <- paste0("V", 1:100) mylist <- lapply(vs, function(v) { x <- data.frame(names=LETTERS[1:25], round(runif(25, 0,100))) names(x)[2] <- v x }) > microbenchmark(Reduce(merge, mylist), myf(mylist)) Unit: milliseconds expr min lq median uq max 1 myf(mylist) 12.81371 13.19746 13.36571 14.40093 33.90468 2 Reduce(merge, mylist) 199.23714 206.28608 207.30247 208.44939 226.05980 

4 Comments

Well, I rarely get downvoted. Not that I don't sometime deserve it, but a comment would be nice. I thought this was pretty slick, and will be faster than Reduce when the data gets bigger, as shown in the edit.
+1 for the benchmark! Reduce is really slow!
This solution no longer seems to work without specifying stringsAsFactors = T when using the data.frame() function
Hi @WillT-E, please go ahead and make the necessary edit. Thanks!
0

Have you tried this function?

http://rss.acs.unt.edu/Rdoc/library/gtools/html/smartbind.html

library(gtools) df1 <- data.frame(list(A=1:10), B=LETTERS[1:10], C=rnorm(10) ) df2 <- data.frame(A=11:20, D=rnorm(10), E=letters[1:10] ) df3 <- df1 out <- smartbind( mylist <- list(df1,df2,df3)) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.