1

Let's say I have a pyspark DF:

| Column A | Column B | | -------- | -------- | | val1 | val1B | | null | val2B | | val2 | null | | val3 | val3B | 

Can someone help me with replacing any null value in any column (for the whole df) with the value right below it? So the final table should look like this:

Column A Column B
val1 val1B
val2 val2B
val2 val3B
val3 val3B

How could this be done? Can I get a code demo if possible? Thank you!

All I've really gotten through is counting all the row nums and creating a condition to find the row nums with all of the null values. So I'm left with a table like this:

Column A Column B row_num
null val2B 2
val2 null 3

But I don't think this step is needed. I'm stuck as to what to do.

1

1 Answer 1

1

Use list squares to coalesce each column with the lead window function. Code below

df.select(*[coalesce(col(x),lead(x).over(Window.partitionBy().orderBy( monotonically_increasing_id()))).alias(x) for x in df.columns]).show() +--------+--------+ |Column A|Column B| +--------+--------+ | val1| val1B| | val2| val2B| | val2| val3B| | val3| val3B| +--------+--------+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.