I have the below data and final_column is the exact output what I am trying to get. I am trying to do cumulative sum of flag and want to rest if flag is 0 then set value to 0 as below data
cola date flag final_column a 2021-10-01 0 0 a 2021-10-02 1 1 a 2021-10-03 1 2 a 2021-10-04 0 0 a 2021-10-05 0 0 a 2021-10-06 0 0 a 2021-10-07 1 1 a 2021-10-08 1 2 a 2021-10-09 1 3 a 2021-10-10 0 0 b 2021-10-01 0 0 b 2021-10-02 1 1 b 2021-10-03 1 2 b 2021-10-04 0 0 b 2021-10-05 0 0 b 2021-10-06 1 1 b 2021-10-07 1 2 b 2021-10-08 1 3 b 2021-10-09 1 4 b 2021-10-10 0 0 I have tried like
import org.apache.spark.sql.functions._ df.withColumn("final_column",expr("sum(flag) over(partition by cola order date asc)")) I have tried to add condition like case when flag = 0 then 0 else 1 end inside sum function but not working.