0

DF1

Id Name Desc etc
A Name1 desc1 etc1
B name2 desc2 etc2

DF2

Id Name Desc etc
A Name2 desc2 etc2
C name2 desc2 etc2

I want to union records from DF2 into DF1 where the ID is equal and include all records from DF1.

Result DF

Id Name Desc etc
A Name1 desc1 etc1
B name2 desc2 etc2
A name2 desc2 etc2

What is the best way to do it? Any help Is appreciated.

3
  • just filter df2 for ids that are present in df1 and then append the dataframes. Commented May 27, 2022 at 14:17
  • Is it more effective to have join or filter? Commented May 27, 2022 at 15:53
  • Can you provide a code example Commented May 27, 2022 at 16:02

1 Answer 1

1

You can do so by doing a semi join keeping only the ids in df2 which exists in df1, followed by a union with df1.

d1 = [ ('A', 'Name1', 'desc1', 'etc1'), ('B', 'name2', 'desc2', 'etc2'), ] d2 = [ ('A', 'Name2', 'desc2', 'etc2'), ('C', 'name2', 'desc2', 'etc2'), ] df1 = spark.createDataFrame(d1, ['Id', 'Name', 'Desc', 'etc']) df2 = spark.createDataFrame(d2, ['Id', 'Name', 'Desc', 'etc']) df2.join(df1, on='Id', how='semi').union(df1).show() +---+-----+-----+----+ | Id| Name| Desc| etc| +---+-----+-----+----+ | A|Name2|desc2|etc2| | A|Name1|desc1|etc1| | B|name2|desc2|etc2| +---+-----+-----+----+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.