You can check row_number ordered by Snap in descending order. The 1st row is the last active snap:
df.selectExpr( '*', 'row_number() over (partition by A, B order by Snap desc) = 1 as activity' ).show() +---+---+---+----------+--------+ | A| B| C| Snap|activity| +---+---+---+----------+--------+ | 1| 2| 4|2019-12-31| true| | 1| 2| 3|2019-12-29| false| +---+---+---+----------+--------+ Edit: to get the end date for each group, use max window function on Snap:
import pyspark.sql.functions as f df.withColumn( 'activity', f.expr('row_number() over (partition by A, B order by Snap desc) = 1') ).withColumn( "end", f.expr('case when activity then null else max(date_add(to_date(Snap), -1)) over (partition by A, B) end') ).show() +---+---+---+----------+--------+----------+ | A| B| C| Snap|activity| end| +---+---+---+----------+--------+----------+ | 1| 2| 4|2019-12-31| true| null| | 1| 2| 3|2019-12-29| false|2019-12-30| +---+---+---+----------+--------+----------+