Merge and concatenate two data frames in R

Question

I have two data frames A and B which look like:

firstDF: col1 col2 id A 1 2 B 5 3 C 6 4 secondDF: col1 col2 id A 1 2 E 15 5 F 16 6 Resultant DF: col1 col2 id A 1 2 B 5 3 C 6 4 E 15 5 F 16 6

The resultant data frame must contain all the rows from the two data frames. Incase there are rows which have the same id, it must be put in the resultant data frame only once.

I tried using the rbind function, but it returns with all the rows merged. I tried using the merge function with condition x.id=y.id, but the resultant data frame created had multiple columns namely x.col1, y.col1,x.col2, y.col2 and so on.

Alex A. · Accepted Answer · 2015-03-20 01:22:28Z

5

You can do this with merge().

merge(df1, df2, by=c("col1", "col2", "id"), all.x=T, all.y=T)

This merges by all common variables, keeping all records in either data frame. Alternatively you can omit the by= argument and R will automatically use all common variables.

As @thelatemail mentioned in a comment, rather than individually specifying all.x=T and all.y=T, you can alternatively use all=T.

edited Mar 20, 2015 at 1:22

answered Mar 20, 2015 at 1:02

Alex A.

5,6064 gold badges30 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user1692342 Over a year ago

I will have to write all column names? I have about 20 columns!!

Alex A. Over a year ago

@user1692342: Do all 20 appear in both data frames? I believe the default behavior if you omit a by= argument is to use all common variables. Maybe try that and see what happens. You'll still want all.x and all.y though.

user1692342 Over a year ago

The merging worked, however the rows which have same "id" repeats

Alex A. Over a year ago

@user1692342: That's expected if there are duplicate id values in either data frame, otherwise it doesn't make sense since you're merging on id. You can subset out the duplicated id values if you want. Use subset(df, !duplicated(id)).

thelatemail Over a year ago

@AlexA.- all=TRUE is shorthand for specifying both all.x=TRUE and all.y=TRUE

|

udondan · Accepted Answer · 2015-03-20 03:36:10Z

You can try the sqldf library. I'm not sure what kind of join. But it would go something like this:

Result =sqldf("select a.col1, a.col2, a.id from firstDF as a join secondDF as b on a.id=b.id")

Or

X=rbind(firstDB, secondDB)

Then filter out duplicates using the unique function.

G. Grothendieck · Accepted Answer · 2015-03-20 03:43:05Z

Using sqldf:

library(sqldf) sqldf("select * from firstDF union select * from secondDF")

Note that union automatically removes duplicates.

Carson Moore · Accepted Answer · 2015-03-20 17:51:06Z

This may not be the most performant answer, but a quick and easy way to do it -- assuming that any duplicate rows are in fact exact duplicates (i.e., for any row in df1 where col_1 = X, if there exists a row in df2 where col_1 = X, all other columns are also identical between those two rows) -- would be to rbind them and get the unique results:

> df1 col_1 col_2 id 1 A 1 2 2 B 5 3 3 C 6 4 > df2 col_1 col_2 id 1 A 1 2 2 E 15 5 3 F 16 6 > unique(rbind(df1, df2)) col_1 col_2 id 1 A 1 2 2 B 5 3 3 C 6 4 5 E 15 5 6 F 16 6

Collectives™ on Stack Overflow

Merge and concatenate two data frames in R

4 Answers 4

7 Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

Comments

Comments

Related