I have a dataset
col_id col_2 col_3 col_id_b ABC111 shfhs 34775 null ABC112 shfhe 34775 DEF345 ABC112 shfhs 34775 GFR563 ABC112 shfgh 34756 TRS572 ABC113 shfdh 34795 null ABC114 shfhs 34770 null I am trying to create a new column that is identical to col_id_b, except that the nulls take the value of the corresponding col_id from that row. So:
col_id col_2 col_3 col_id_b col_new ABC111 shfhs 34775 null ABC111 ABC112 shfhe 34775 DEF345 DEF345 ABC112 shfhs 34775 GFR563 GFR563 ABC112 shfgh 34756 TRS572 TRS572 ABC113 shfdh 34795 null ABC113 ABC114 shfhs 34770 null ABC114 I know about:
df.select(coalesce(df["col_id"], df["col_id_b"])).show() But in my case there are my rows where both are not-null. How do I introduce this condition?