Outer join in python Pandas

Question

I have two data sets as following

A B IDs IDs 1 1 2 2 3 5 4 7

How in Pandas, Numpy we can apply a join which can give me all the data from B, which is not present in A Something like Following

B Ids 5 7

I know it can be done with for loop, but that I don't want, since my real data is in millions, and I am really not sure how to use Panda Numpy here, something like following

pd.merge(A, B, on='ids', how='right')

Thanks

what is the expected output? The column names seem to be A and B and not IDs ... this is misleading. — Colonel Beauvel
– Colonel Beauvel, Commented Jun 7, 2016 at 13:08

Divakar · Accepted Answer · 2016-06-07 13:26:53Z

3

You can use NumPy's setdiff1d, like so -

np.setdiff1d(B['IDs'],A['IDs'])

Also, np.in1d could be used for the same effect, like so -

B[~np.in1d(B['IDs'],A['IDs'])]

Please note that np.setdiff1d would give us a sorted NumPy array as output.

Sample run -

>>> A = pd.DataFrame([1,2,3,4],columns=['IDs']) >>> B = pd.DataFrame([1,7,5,2],columns=['IDs']) >>> np.setdiff1d(B['IDs'],A['IDs']) array([5, 7]) >>> B[~np.in1d(B['IDs'],A['IDs'])] IDs 1 7 2 5

edited Jun 7, 2016 at 13:26

answered Jun 7, 2016 at 13:15

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Manu Sharma Over a year ago

Thank You so much! But despite of my several attempts: "I am receiving error, List indices must be integers not lists"

Divakar Over a year ago

@manusharma So, do you have anything else apart from integers in that column of IDs, like strings maybe or integers as strings?

Manu Sharma Over a year ago

I have two large Lists/ Dataframe, some of them are long, Integers, I tried to use Map(int, dataset) to convert all in one, still the same error List Indices must be integers not lists

jezrael · Accepted Answer · 2016-06-07 13:08:55Z

You can use merge with parameter indicator and then boolean indexing. Last you can drop column _merge:

A = pd.DataFrame({'IDs':[1,2,3,4], 'B':[4,5,6,7], 'C':[1,8,9,4]}) print (A) B C IDs 0 4 1 1 1 5 8 2 2 6 9 3 3 7 4 4 B = pd.DataFrame({'IDs':[1,2,5,7], 'A':[1,8,3,7], 'D':[1,8,9,4]}) print (B) A D IDs 0 1 1 1 1 8 8 2 2 3 9 5 3 7 4 7 df = (pd.merge(A, B, on='IDs', how='outer', indicator=True)) df = df[df._merge == 'right_only'] df = df.drop('_merge', axis=1) print (df) B C IDs A D 4 NaN NaN 5.0 3.0 9.0 5 NaN NaN 7.0 7.0 4.0

Kurt Peek · Accepted Answer · 2016-06-07 13:28:39Z

You could convert the data series to sets and take the difference:

import pandas as pd df=pd.DataFrame({'A' : [1,2,3,4], 'B' : [1,2,5,7]}) A=set(df['A']) B=set(df['B']) C=pd.DataFrame({'C' : list(B-A)}) # Take difference and convert back to DataFrame

The variable "C" then yields

 C 0 5 1 7

Alex Petralia · Accepted Answer · 2016-06-07 14:23:28Z

You can simply use pandas' .isin() method:

df = pd.DataFrame({'A' : [1,2,3,4], 'B' : [1,2,5,7]}) df[~df['B'].isin(df['A'])]

If these are separate DataFrames:

a = pd.DataFrame({'IDs' : [1,2,3,4]}) b = pd.DataFrame({'IDs' : [1,2,5,7]}) b[~b['IDs'].isin(a['IDs'])]

Output:

 IDs 2 5 3 7

Collectives™ on Stack Overflow

Outer join in python Pandas

4 Answers 4

3 Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Related