I am quite new to spark and can't get it to work... Hopefully, there is an easy way of doing this... What I am trying to do is best described by the following table: (I need to get the "required" column)
colA colB colC ref required 1 a1 b1 c1 colA a1 2 a2 b2 c2 colA a2 3 a3 b3 c3 colB b3 4 a4 b4 c4 colB b4 5 a5 b5 c5 colC c5 6 a6 b6 c6 colC c6 The above is just an example - in the real example I have >50 columns, so doing conditions is not going to work...
I know this can be easily done in pandas using something like:
df['required'] = df.apply(lambda x: x.loc[x.ref], axis=1) or
df['required'] = df.lookup(df.index, df.ref) Any suggestions how to do this in PySpark?