PySpark create new column from existing column with a list of values

Question

I've got a DataFrame like this:

from pyspark.sql import SparkSession from pyspark import Row spark = SparkSession.builder \ .appName('DataFrame') \ .master('local[*]') \ .getOrCreate() df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'), Row(a=2, b='', c=['0', '1'], d='bar'), Row(a=3, b='', c=['0', '1'], d='foo')]) | a| b| c| d| +---+---+------+---+ | 1| |[0, 1]|foo| | 2| |[0, 1]|bar| | 3| |[0, 1]|foo| +---+---+------+---+

I would like to create column "e" with first element of "c" column and "f" column with second element of "c" column", to look like this:

|a |b |c |d |e |f | +---+---+------+---+---+---+ |1 | |[0, 1]|foo|0 |1 | |2 | |[0, 1]|bar|0 |1 | |3 | |[0, 1]|foo|0 |1 | +---+---+------+---+---+---+

Possible duplicate of How to extract an element from a array in pyspark — pault
– pault, Commented Aug 22, 2019 at 14:22

Pierre Gourseaud · Accepted Answer · 2019-08-22 08:48:16Z

df = spark.createDataFrame([Row(a=1, b='', c=['0', '1'], d='foo'), Row(a=2, b='', c=['0', '1'], d='bar'), Row(a=3, b='', c=['0', '1'], d='foo')]) df2 = df.withColumn('e', df['c'][0]).withColumn('f', df['c'][1]) df2.show() +---+---+------+---+---+---+ |a |b |c |d |e |f | +---+---+------+---+---+---+ |1 | |[0, 1]|foo|0 |1 | |2 | |[0, 1]|bar|0 |1 | |3 | |[0, 1]|foo|0 |1 | +---+---+------+---+---+---+

Collectives™ on Stack Overflow

PySpark create new column from existing column with a list of values

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related