I have a dataframe which looks like one given below. All the values for a corresponding id is the same except for the mappingcol field.
+--------------------+----------------+--------------------+-------+ |misc |fruit |mappingcol |id | +--------------------+----------------+--------------------+-------+ |ddd |apple |Map("name"->"Sameer"| 1 | |ref |banana |Map("name"->"Riyazi"| 2 | |ref |banana |Map("lname"->"Nikki"| 2 | |ddd |apple |Map("lname"->"tenka"| 1 | +--------------------+----------------+--------------------+-------+ I want to merge the rows with same row in such a way that I get exactly one row for one id and the value of mappingcol needs to be merged. The output should look like :
+--------------------+----------------+--------------------+-------+ |misc |fruit |mappingcol |id | +--------------------+----------------+--------------------+-------+ |ddd |apple |Map("name"->"Sameer"| 1 | |ref |banana |Map("name"->"Riyazi"| 2 | +--------------------+----------------+--------------------+-------+ the value for mappingcol for id = 1 would be :
Map( "name" -> "Sameer", "lname" -> "tenka" ) I know that maps can be merged using ++ operator, so thats not what im worried about. I just cant understand how to merge the rows, because if I use a groupBy, I have nothing to aggregate the rows on.