spark dataframe concatenate column name to values

Question

I have a dataframe which I want to modify in a way that each row will containt the column name . for example :

FirstName LastName Jhon Doe David Lue

to create the follwing

(FirstName=Jhon,LastName=Doe) (FirstName=David,LastName=Lue)

I managed to do for df with 2 columns

val x = df.map { row => (names(0) + "=" +row(0) , names(1)+"="+rows(1)}

but how can I do it with for loop for any number of columns?

Thanks

Jacek Laskowski · Accepted Answer · 2017-05-21 13:12:08Z

One option is to use foldLeft on the column names:

import org.apache.spark.sql.functions._ import org.apache.spark.sql.DataFrame import sqlContext.implicits._ val df = Seq( ("John", "Doe"), ("David", "Lue") ).toDF("first_name", "last_name") val x = df.columns.foldLeft(df) { (acc: DataFrame, colName: String) => acc.withColumn(colName, concat(lit(colName + "="), col(colName))) } x.show()

Resulting in:

+----------------+-------------+ | first_name| last_name| +----------------+-------------+ | first_name=John|last_name=Doe| |first_name=David|last_name=Lue| +----------------+-------------+

If you then want to convert it to an RDD of tuples, you can call a map on it:

x.rdd.map(r => (r.getString(0), r.getString(1)))

or even with Spark SQL's typed API:

x.as[(String, String)].rdd

Thanks a lot ! It worked like a charm. Since Im a new user, my marking it as the right answer is calculated but not displayed. Thanks Again!

Collectives™ on Stack Overflow

spark dataframe concatenate column name to values

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related