select by common values in multiple pandas dataframes

Question

I'm a newbie to pandas, and suspect this has a simple answer, but it's stumping me right now. I want to select the rows from multiple dataframes (with same columns) where a value in one column occurs in all of the dataframes.

So if I have the following:

import numpy as np import pandas as pd df1 = pd.DataFrame({'Col1'['Adams','Baker','Cash','Jones','Thomas'],\ 'Col2': ['A','B','C','D','E'],'Col3':[10,11,12,13,14]}) df2 = pd.DataFrame({'Col1':['Adams','Cash','Fox','Jones','Johnson'],\ 'Col2': ['D','E','F','G','H'],'Col3':[40,50,60,70,80]}) df3 = pd.DataFrame({'Col1': ['Adams','Barns','Jones','Smith','Thomas'],\ 'Col2':['F','G','J','L','M'],'Col3':[21,21,22,23,24]}) print df1 print df2 print df3

Giving:

 Col 1 Col2 Col3 0 Adams A 10 1 Baker B 11 2 Cash C 12 3 Jones D 13 4 Thomas E 14 Col1 Col2 Col3 0 Adams D 40 1 Cash E 50 2 Fox F 60 3 Jones G 70 4 Johnson H 80 Col1 Col2 Col3 0 Adams F 21 1 Barns G 21 2 Jones J 22 3 Smith L 23 4 Thomas M 24

I want to end up with:

Adams A 10 Adams D 40 Adams F 21 Jones D 13 Jones G 70 Jones J 22

Is there a succinct way to do this?

Ami Tavory · Accepted Answer · 2016-03-04 23:32:54Z

You can first find the common values:

common = \ set.intersection(set(df1.Col1), set(df2.Col1), set(df3.Col1))

Then concatenate the rows whose values are within the set of common values:

pd.concat([ df1[df1.Col1.isin(common)], df2[df2.Col1.isin(common)], df3[df3.Col1.isin(common)]]).sort_values(by='Col1')

bbkglb · Accepted Answer · 2016-03-04 23:31:56Z

One solution would be to first get the values of Col1 that exist in all of your DataFrames, then restrict each of the DataFrames to these common values and then concatenate the restricted DataFrames:

common_values = list(set(df1.Col1) & set(df2.Col1) & set(df3.Col1)) df = pd.concat([df1[df1.Col1.isin(common_values)], df2[df2.Col1.isin(common_values)], df3[df3.Col1.isin(common_values)]], ignore_index=True)

This will give you the following result:

print(df) Col1 Col2 Col3 Adams A 10 Jones D 13 Adams D 40 Jones G 70 Adams F 21 Jones J 22

I also noticed that your desired DataFrame is sorted, if that is the case, you can sort it with the following command:

df = df.sort(['Col1', 'Col2', 'Col3'])

This will give you:

print(df) Col1 Col2 Col3 Adams A 10 Adams D 40 Adams F 21 Jones D 13 Jones G 70 Jones J 22

Alexander · Accepted Answer · 2016-03-05 02:23:59Z

First, create a container to hold your dataframes (dfs). You can use reduce on the set intersection of names in Col1 to find names in all of the dataframes. Lastly, you can use query together with concat to join the relevant rows.

dfs = (df1, df2, df3) common_names = \ reduce(lambda s, df: s.intersection(set(df.Col1)), # The set intersection of names. dfs[1:], # Sequence of dataframes to pass through reduce starting with second. set(dfs[0].Col1)) # Seed reduce with the column of names from first dataframe. >>> common_names {'Adams', 'Jones'} >>> pd.concat((df.query('Col1 in @common_names') for df in dfs), ignore_index=True ).sort_values('Col1') Col1 Col2 Col3 0 Adams A 10 2 Adams D 40 4 Adams F 21 1 Jones D 13 3 Jones G 70 5 Jones J 22

This solution scales nicely, as you only need to change the contents of dfs should the number of dataframes change.

P.S. After re-reading the question, I realized you're a Pandas newbie. Welcome! Although Pandas is incredibly powerful, there is no simple solution to your question. Basically, you want to filter down your dataframes to get a list of common names. Then you want to filter each dataframe for those names, and concatenate the results.

Collectives™ on Stack Overflow

select by common values in multiple pandas dataframes

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related