Spark: how to make value of new column based on different columns

Question

Spark 2.2.1 Pyspark

df = sqlContext.createDataFrame([ ("dog", "1", "2", "3"), ("cat", "4", "5", "6"), ("dog", "7", "8", "9"), ("cat", "10", "11", "12"), ("dog", "13", "14", "15"), ("parrot", "16", "17", "18"), ("goldfish", "19", "20", "21"), ], ["pet", "dog_30", "cat_30", "parrot_30"])

And then I have list of the fields that I care above from the "Pet" column

dfvalues = ["dog", "cat", "parrot"]

I want to write code taht will give me the value from dog_30, cat_30 or parrot_30 that corresponds to the value in "pet". For example, in the first row the value for the pet column is dog and so we take the value for dog_30 which is 1.

I tried using this to get the code, but it just gives me nulls for the column stats. I also haven't figured out how to handle the goldfish case. I want to set that to 0.

mycols = [F.when(F.col("pet") == p + "_30", p) for p in dfvalues] df = df.withColumn("newCol2",F.coalesce(*stats) ) df.show()

Desired output:

+--------+------+------+---------+------+ | pet|dog_30|cat_30|parrot_30|stats | +--------+------+------+---------+------+ | dog| 1| 2| 3| 1 | | cat| 4| 5| 6| 5 | | dog| 7| 8| 9| 7 | | cat| 10| 11| 12| 11 | | dog| 13| 14| 15| 13 | | parrot| 16| 17| 18| 18 | |goldfish| 19| 20| 21| 0 | +--------+------+------+---------+------+

akuiper · Accepted Answer · 2018-10-04 14:06:04Z

The logic is off; you need .when(F.col("pet") == p, F.col(p + '_30')):

mycols = [F.when(F.col("pet") == p, F.col(p + '_30')) for p in dfvalues] df = df.withColumn("newCol2",F.coalesce(F.coalesce(*mycols),F.lit(0))) df.show() +--------+------+------+---------+-------+ | pet|dog_30|cat_30|parrot_30|newCol2| +--------+------+------+---------+-------+ | dog| 1| 2| 3| 1| | cat| 4| 5| 6| 5| | dog| 7| 8| 9| 7| | cat| 10| 11| 12| 11| | dog| 13| 14| 15| 13| | parrot| 16| 17| 18| 18| |goldfish| 19| 20| 21| 0| +--------+------+------+---------+-------+

Collectives™ on Stack Overflow

Spark: how to make value of new column based on different columns

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related