Fill nulls with values from another column in PySpark

Question

I have a dataset

col_id col_2 col_3 col_id_b ABC111 shfhs 34775 null ABC112 shfhe 34775 DEF345 ABC112 shfhs 34775 GFR563 ABC112 shfgh 34756 TRS572 ABC113 shfdh 34795 null ABC114 shfhs 34770 null

I am trying to create a new column that is identical to col_id_b, except that the nulls take the value of the corresponding col_id from that row. So:

col_id col_2 col_3 col_id_b col_new ABC111 shfhs 34775 null ABC111 ABC112 shfhe 34775 DEF345 DEF345 ABC112 shfhs 34775 GFR563 GFR563 ABC112 shfgh 34756 TRS572 TRS572 ABC113 shfdh 34795 null ABC113 ABC114 shfhs 34770 null ABC114

I know about:

df.select(coalesce(df["col_id"], df["col_id_b"])).show()

But in my case there are my rows where both are not-null. How do I introduce this condition?

Ric S · Accepted Answer · 2022-11-15 12:05:31Z

2

Just invert the order of the columns:

df.select(coalesce(col('col_id_b'), col('col_id')))

coalesce returns the first column that is not null; so if you specify col_id_b first, it this is not null, you will have col_id_b, otherwise col_id.

edited Nov 15, 2022 at 12:05

answered Nov 15, 2022 at 10:32

Ric S

9,3184 gold badges30 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

johnnydoe Over a year ago

Thank you! It says column name coalesce('col_id_b', 'col_id') contains invalid characters and I need an alias to rename it, but I don't see why.

Ric S Over a year ago

Me neither honestly... What is the exact error stacktrace?

johnnydoe Over a year ago

I solved it with: df = df.withColumn("new_col", coalesce(col("col_id_b"),col("col_id")))

Ric S Over a year ago

Ok, in my tests the col function was not needed, but of course the script settings and previous code were from yours. I'll edit my answer with your change so that future users do not bump into the same issue. Glad to have helped!

Collectives™ on Stack Overflow

Fill nulls with values from another column in PySpark

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related