I have a pyspark Dataframe:
Dataframe example:
id | column_1 | column_2 | column_3 -------------------------------------------- 1 | ["12"] | ["""] | ["67"] -------------------------------------------- 2 | ["""] | ["78"] | ["90"] -------------------------------------------- 3 | ["""] | ["93"] | ["56"] -------------------------------------------- 4 | ["100"] | ["78"] | ["90"] -------------------------------------------- I want to convert all the values ["""] of the columns: column_1, column_2, column_3 to null. types of these 3 columns is an Array.
Excpect result:
id | column_1 | column_2 | column_3 -------------------------------------------- 1 | ["12"] | null | ["67"] -------------------------------------------- 2 | null | ["78"] | ["90"] -------------------------------------------- 3 | null | ["93"] | ["56"] -------------------------------------------- 4 | ["100"] | ["78"] | ["90"] -------------------------------------------- I tried this solution bellow:
df = df.withColumn( "column_1", F.when((F.size(F.col("column_1")) == ""), F.lit(None)).otherwise(F.col("column_1")) ).withColumn( "column_2", F.when((F.size(F.col("column_2")) == ""), F.lit(None)).otherwise(F.col("column_2")) ).withColumn( "column_3", F.when((F.size(F.col("column_3")) == ""), F.lit(None)).otherwise(F.col("column_3")) ) But it convert all to null. How can I test on an empty array that contain an empty String normally, [""] not []. Thank you