how to work in multiline in dataframe spark and update column in multiline [duplicate]

Question

in Pyspark, I have a dataframe spark in this format :

CODE | TITLE | POSITION A | per | 1 A | eis | 3 A | fon | 4 A | dat | 5 B | jem | 2 B | neu | 3 B | tri | 5 B | nok | 6

and I want to have that :

CODE | TITLE | POSITION A | per | 1 A | eis | 2 A | fon | 3 A | dat | 4 B | jem | 1 B | neu | 2 B | tri | 3 B | nok | 4

the idea is that the column position starts at 1, and for example for the CODE A, it starts with 1 and I have the position 2 missing, then I need to make 3-1=>2, 4-1=>3 and 5=>4

how can we do that in pyspark ?

thank you for your help

Michael Szczesny · Accepted Answer · 2020-09-18 16:22:52Z

With a slightly simpler dataframe

df.show() +----+-----+--------+ |CODE|TITLE|POSITION| +----+-----+--------+ | A| AA| 1| | A| BB| 3| | A| CC| 4| | A| DD| 5| | B| EE| 2| | B| FF| 3| | B| GG| 5| | B| HH| 6| +----+-----+--------+ from pyspark.sql.functions import row_number from pyspark.sql.window import Window df.withColumn('POSITION', row_number().over(Window.partitionBy('CODE').orderBy('POSITION'))).show() +----+-----+--------+ |CODE|TITLE|POSITION| +----+-----+--------+ | B| EE| 1| | B| FF| 2| | B| GG| 3| | B| HH| 4| | A| AA| 1| | A| BB| 2| | A| CC| 3| | A| DD| 4| +----+-----+--------+

Collectives™ on Stack Overflow

how to work in multiline in dataframe spark and update column in multiline [duplicate]

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related