Pyspark How to update all null values from all column in a dataframe?

Question

I have below DF with null values in some columns.

Now I need to update/replace those 'null' values with 'NA'

+-------+------+-----+------+----+ |Product|Canada|China|Mexico| USA| +-------+------+-----+------+----+ | Orange| null| 4000| null|4000| | Beans| null| 1500| 2000|1600| | Banana| 2000| 400| null|1000| |Carrots| 2000| 1200| null|1500| +-------+------+-----+------+----+

I found the method 'fillna' to replace the null value

however I need to update/replace all column having null values

So something like this or better way

replaced = df.fillna({str(col):'NA', col for col in df.columns})

Appreciate any help to get the right approach

Thanks

What is the data type of these columns (other than product)? Can you add the schema? — ernest_k
– ernest_k, Commented Nov 5, 2020 at 10:27

dsk · Accepted Answer · 2020-11-05 11:11:06Z

You need to use subset() and pass the column name in order fill with Null values

df = df.fillna(0, subset=['Canada', 'China', 'Mexico', 'USA'])

or , in case if you want to use fillna() for all the columns , pass them in a dictionary , also you can specify your choice :)

df = df.fillna({'Canada':'4', 'China': '5', 'Mexico' : '6', 'USA': '7})

Or, you can simply use below to fill all the columns with null values

df = df.fillna("a_value")

Hi @dsk.. i tried all the way you suggested, but none of those giving output as expected.. it was still retuning with null value
Can you try converting to StringType and fill with na. and check
let me know please where i supposed to convert to a string...
df = df.withColumn("Canada", F.col(Canada).cast(T.StringType())) - try this

Collectives™ on Stack Overflow

Pyspark How to update all null values from all column in a dataframe?

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related