Combine two columns in SparkR

Question

What's the easy way to combine two columns in SparkR? Consider following Spark DF:

salary_from salary_to position 1500 null a null 1300 b 800 1000 c

I would like to get combined salary column with logic like this. From salary_from and salary_to take the one that is not null, and if both present, then take a value in the middle.

salary_from salary_to position salary 1500 null a 1500 null 1300 b 1300 800 1000 c 900

Is there a way to walk through every line and apply my logic, like I would do with apply method in R?

I heard about a package combining sparkr and dplyr, sparkrext but i didn't use it yer github.com/hoxo-m/SparkRext. Maybe it could help you .. — Mostafa90
– Mostafa90, Commented Apr 4, 2016 at 15:18

zero323 · Accepted Answer · 2016-04-04 15:38:00Z

1

You can use coalesce function:

withColumn( sdf, "salary", expr("coalesce((salary_from + salary_to) / 2, salary_from, salary_to)") )

which returns the first not null expression.

answered Apr 4, 2016 at 15:38

zero323

331k108 gold badges982 silver badges958 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Misha Slyusarev Over a year ago

I can't find coalesce in SparkR API reference spark.apache.org/docs/latest/api/R/index.html. Could you please point me out to where I can find more about it?

zero323 Over a year ago

You cannot because it is not there yet. Thats why you need expr . Otherwise is just a plain SQL coalesce so any SQL reference will do. For example w3schools.com/sql/sql_isnull.asp. Or PySpark docstrings: github.com/apache/spark/blob/master/python/pyspark/sql/…

Misha Slyusarev Over a year ago

is there a way to loop through rows of Spark Data Frame?

zero323 Over a year ago

Other than fetching a complete structure to driver? No.

Misha Slyusarev Over a year ago

This all sounds to me like SparkR is incomplete and inefficient. Don't you think so? Or maybe I just misunderstand the purposes of it's use.

|

Collectives™ on Stack Overflow

Combine two columns in SparkR

1 Answer 1

7 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Related