Join two dataframes in pyspark by one column

Question

I have a two dataframes that I need to join by one column and take just rows from the first dataframe if that id is contained in the same column of second dataframe:

df1:

 id a b 2 1 1 3 0.5 1 4 1 2 5 2 1

df2:

 id c d 2 fs a 5 fa f

Desired output:

df: id a b 2 1 1 5 2 1

I have tried with df1.join(df2("id"),"left"), but gives me error :'Dataframe' object is not callable.

akuiper · Accepted Answer · 2017-09-26 18:18:47Z

df2("id") is not a valid python syntax for selecting columns, you'd either need df2[["id"]] or use select df2.select("id"); For your example, you can do:

df1.join(df2.select("id"), "id").show() +---+---+---+ | id| a| b| +---+---+---+ | 5|2.0| 1| | 2|1.0| 1| +---+---+---+

or:

df1.join(df2[["id"]], "id").show() +---+---+---+ | id| a| b| +---+---+---+ | 5|2.0| 1| | 2|1.0| 1| +---+---+---+

Sorry, I have copied my syntaxt, but made another dataframes here. Imsi is and ID

Shantanu Sharma · Accepted Answer · 2019-05-01 11:29:22Z

If you need to check if id exists in df2 and does not need any column in your output from df2 then isin() is more efficient solution (This is similar to EXISTS and IN in SQL).

df1 = spark.createDataFrame([(2,1,1) ,(3,5,1,),(4,1,2),(5,2,1)], "id: Int, a : Int , b : Int") df2 = spark.createDataFrame([(2,'fs','a') ,(5,'fa','f')], ['id','c','d'])

Create df2.id as list and pass it to df1 under isin()

from pyspark.sql.functions import col df2_list = df2.select('id').rdd.map(lambda row : row[0]).collect() df1.where(col('id').isin(df2_list)).show() #+---+---+---+ #| id| a| b| #+---+---+---+ #| 2| 1| 1| #| 5| 2| 1| #+---+---+---+

It is reccomended to use isin() IF -

You don't need to return data from the refrence dataframe/table
You have duplicates in the refrence dataframe/table (JOIN can cause duplicate rows if values are repeated)
You just want to check existence of particular value

Collectives™ on Stack Overflow

Join two dataframes in pyspark by one column

2 Answers 2

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Linked

Related