0

I have created two data frames by executing below command. I want to join the two data frames and result data frames contain non duplicate items in PySpark.

df1 = sc.parallelize([ ("a",1,1), ("b",2,2), ("d",4,2), ("e",4,1), ("c",3,4)]).toDF(['SID','SSection','SRank']) df1.show() 
+---+--------+-----+ |SID|SSection|SRank| +---+--------+-----+ | a| 1| 1| | b| 2| 2| | d| 4| 2| | e| 4| 1| | c| 3| 4| +---+--------+-----+ 

df2 is

df2=sc.parallelize([ ("a",2,1), ("b",2,3), ("f",4,2), ("e",4,1), ("c",3,4)]).toDF(['SID','SSection','SRank']) 
+---+--------+-----+ |SID|SSection|SRank| +---+--------+-----+ | a| 2| 1| | b| 2| 3| | f| 4| 2| | e| 4| 1| | c| 3| 4|ggVG +---+--------+-----+ 

I want to join above two tables like below.

+---+--------+----------+----------+ |SID|SSection|test1SRank|test2SRank| +---+--------+----------+----------+ | f| 4| 0| 2| | e| 4| 1| 1| | d| 4| 2| 0| | c| 3| 4| 4| | b| 2| 2| 3| | a| 1| 1| 0| | a| 2| 0| 1| +---+--------+----------+----------+ 
0

2 Answers 2

1

Doesn't look like something that can be achieved with a single join. Here's a solution involving multiple joins:

from pyspark.sql.functions import col d1 = df1.unionAll(df2).select("SID" , "SSection" ).distinct() t1 = d1.join(df1 , ["SID", "SSection"] , "leftOuter").select(d1.SID , d1.SSection , col("SRank").alias("test1Srank")) t2 = d1.join(df2 , ["SID", "SSection"] , "leftOuter").select(d1.SID , d1.SSection , col("SRank").alias("test2Srank")) t1.join(t2, ["SID", "SSection"]).na.fill(0).show() +---+--------+----------+----------+ |SID|SSection|test1Srank|test2Srank| +---+--------+----------+----------+ | b| 2| 2| 3| | c| 3| 4| 4| | d| 4| 2| 0| | e| 4| 1| 1| | f| 4| 0| 2| | a| 1| 1| 0| | a| 2| 0| 1| +---+--------+----------+----------+ 
Sign up to request clarification or add additional context in comments.

Comments

1

You can simply rename the SRank column names and use outer join and use na.fill function

df1.withColumnRenamed("SRank", "test1SRank").join(df2.withColumnRenamed("SRank", "test2SRank"), ["SID", "SSection"], "outer").na.fill(0) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.