I have spark dataframe mainDF and deltaDF both with a matching schema.
Content of the mainDF is as follows:
id | name | age 1 | abc | 23 2 | xyz | 34 3 | pqr | 45 Content of deltaDF is as follows:
id | name | age 1 | lmn | 56 4 | efg | 37 I want to merge deltaDF with mainDF based on value of id. So if my id already exists in mainDF then the record should be updated and if id doesn't exist then the new record should be added. So the resulting data frame should be like this:
id | name | age 1 | lmn | 56 2 | xyz | 34 3 | pqr | 45 4 | efg | 37 This is my current code and it is working:
val updatedDF = mainDF.as("main").join(deltaDF.as("delta"),$"main.id" === $"delta.id","inner").select($"main.id",$"main.name",$"main.age") mainDF= mainDF.except(updateDF).unionAll(deltaDF) However here I need to explicitly provide list columns again in the select function which feels like overhead to me. Is there any other better/cleaner approach to achieve the same?
join()?