you can achieve your result by defining a udf function and passing the collected struct columns to the udf function for sorting and populating the nulls with not null values. (comments are provided in the code for explanation)
import org.apache.spark.sql.functions._ //udf function definition def sortAndAggUdf = udf((structs: Seq[Row])=>{ //sorting the collected list by timestamp in descending order val sortedStruct = structs.sortBy(str => str.getAs[Long]("UpdatedtimeStamp"))(Ordering[Long].reverse) //selecting the first struct and casting to out case class val first = out(sortedStruct(0).getAs[String]("Name"), sortedStruct(0).getAs[String]("Passport"), sortedStruct(0).getAs[String]("Country"), sortedStruct(0).getAs[String]("License"), sortedStruct(0).getAs[Long]("UpdatedtimeStamp")) //aggregation for checking nulls and populating first not null value sortedStruct .foldLeft(first)((x, y) => { out( if(x.Name == null || x.Name.isEmpty) y.getAs[String]("Name") else x.Name, if(x.Passport == null || x.Passport.isEmpty) y.getAs[String]("Passport") else x.Passport, if(x.Country == null || x.Country.isEmpty) y.getAs[String]("Country") else x.Country, if(x.License == null || x.License.isEmpty) y.getAs[String]("License") else x.License, x.UpdatedtimeStamp) }) }) //making the rest of the columns as one column and changing the UpdatedtimeStamp column to long for sorting in udf df.select(col("ID"), struct(col("Name"), col("Passport"), col("Country"), col("License"), unix_timestamp(col("UpdatedtimeStamp"), "MM-dd-yyyy").as("UpdatedtimeStamp")).as("struct")) //grouping and collecting the structs and passing to udf function for manipulation .groupBy("ID").agg(sortAndAggUdf(collect_list("struct")).as("struct")) //separating the aggregated columns to separate columns .select(col("ID"), col("struct.*")) //getting the date in correct format .withColumn("UpdatedtimeStamp", date_format(col("UpdatedtimeStamp").cast("timestamp"), "MM-dd-yyyy")) .show(false)
which should give you
+---+----+--------+-------+-------+----------------+ |ID |Name|Passport|Country|License|UpdatedtimeStamp| +---+----+--------+-------+-------+----------------+ |1 |Shah|12345 |null |ABC |12-02-2018 | |2 |PJ |null |ANB |a |10-02-2018 | +---+----+--------+-------+-------+----------------+
and of course a case class is needed
case class out(Name: String, Passport: String, Country: String, License: String, UpdatedtimeStamp: Long)