Spark 2.2.1 Pyspark
df = sqlContext.createDataFrame([ ("dog", "1", "2", "3"), ("cat", "4", "5", "6"), ("dog", "7", "8", "9"), ("cat", "10", "11", "12"), ("dog", "13", "14", "15"), ("parrot", "16", "17", "18"), ("goldfish", "19", "20", "21"), ], ["pet", "dog_30", "cat_30", "parrot_30"]) And then I have list of the fields that I care above from the "Pet" column
dfvalues = ["dog", "cat", "parrot"] I want to write code taht will give me the value from dog_30, cat_30 or parrot_30 that corresponds to the value in "pet". For example, in the first row the value for the pet column is dog and so we take the value for dog_30 which is 1.
I tried using this to get the code, but it just gives me nulls for the column stats. I also haven't figured out how to handle the goldfish case. I want to set that to 0.
mycols = [F.when(F.col("pet") == p + "_30", p) for p in dfvalues] df = df.withColumn("newCol2",F.coalesce(*stats) ) df.show() Desired output:
+--------+------+------+---------+------+ | pet|dog_30|cat_30|parrot_30|stats | +--------+------+------+---------+------+ | dog| 1| 2| 3| 1 | | cat| 4| 5| 6| 5 | | dog| 7| 8| 9| 7 | | cat| 10| 11| 12| 11 | | dog| 13| 14| 15| 13 | | parrot| 16| 17| 18| 18 | |goldfish| 19| 20| 21| 0 | +--------+------+------+---------+------+