I'm new to Python and PySpark. I have a dataframe in PySpark like the following:
## +---+---+------+ ## | x1| x2| x3 | ## +---+---+------+ ## | 0| a | 13.0| ## | 2| B | -33.0| ## | 1| B | -63.0| ## +---+---+------+ I have an array: arr = [10, 12, 13]
I want to create a column x4 in the dataframe such that it should have the corresponding values from the list based on the values of x1 as indices. The final dataset should look like:
## +---+---+------+-----+ ## | x1| x2| x3 | x4 | ## +---+---+------+-----+ ## | 0| a | 13.0| 10 | ## | 2| B | -33.0| 13 | ## | 1| B | -63.0| 12 | ## +---+---+------+-----+ I have tried using the following code to achieve so:
df.withColumn("x4", lit(arr[col('x1')])).show()
However, I am getting an error:
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices Is there any way I can achieve this efficiently?