4

I need to merge rows in the same dataframe based on a key column "id". In the sample data frame, 1 row has data for id,name and age. The other row has id,name, and salary. Rows with same key 'id' have to be merged a single record in the final data frame. If there is just one record, should show them as well with null values [Smith, and Jake] as in example below.

The computation needs to happen on real time data, spark native function based solution would be ideal. I have tried filtering the records based on age and city columns to separate data frames and them perform a left join on ID. But its not very efficient. Looking for any alternate suggestions. Thanks in advance!

Sample Dataframe

val inputDF= Seq(("100","John", Some(35),None) ,("100","John", None,Some("Georgia")), ("101","Mike", Some(25),None), ("101","Mike", None,Some("New York")), ("103","Mary", Some(22),None), ("103","Mary", None,Some("Texas")), ("104","Smith", Some(25),None), ("105","Jake", None,Some("Florida"))) .toDF("id","name","age","city") 

Input Dataframe

+---+-----+----+--------+ |id |name |age |city | +---+-----+----+--------+ |100|John |35 |null | |100|John |null|Georgia | |101|Mike |25 |null | |101|Mike |null|New York| |103|Mary |22 |null | |103|Mary |null|Texas | |104|Smith|25 |null | |105|Jake |null|Florida | +---+-----+----+--------+ 

Expected Output Dataframe

+---+-----+----+---------+ | id| name| age| city| +---+-----+----+---------+ |100| John| 35| Georgia| |101| Mike| 25| New York| |103| Mary| 22| Texas| |104|Smith| 25| null| |105| Jake|null| Florida| +---+-----+----+---------+ 

1 Answer 1

9

Use first or last standard functions with ignoreNulls flag on.

first standard function

val q = inputDF .groupBy("id", "name") .agg(first("age", ignoreNulls = true) as "age", first("city", ignoreNulls = true) as "city") .orderBy("id") 

last standard function

val q = inputDF .groupBy("id","name") .agg(last("age", true) as "age", last("city") as "city") .orderBy("id") 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.