Filter operation in Spark Dataframe

Question

I have a Spark dataframe and I want to select few rows/records from them based on a matching value for a particular column. I guess I can do this using Filter operation or select operation in a map transformation.

But , i want to update a status column against those rows/records which has not been selected on applying filter.

On applying filter operation , I am getting back in response a new dataframe consisting of matching records.

So, How to know & update the column value of rows which are not selected?

Lakshman Battini · Accepted Answer · 2018-07-25 02:57:34Z

On applying filter operaiton, you get the new Dataframe cosisting of matching records.

Then, you can use except function in Scala to get the Un-matching records from the input dataframe.

scala> val inputDF = Seq(("a", 1),("b", 2), ("c", 3), ("d", 4), ("e", 5)).toDF("id", "count") inputDF: org.apache.spark.sql.DataFrame = [id: string, count: int] scala> val filterDF = inputDF.filter($"count" > 3) filterDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: string, count: int] scala> filterDF.show() +---+-----+ | id|count| +---+-----+ | d| 4| | e| 5| +---+-----+ scala> val unmatchDF = inputDF.except(filterDF) unmatchDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [id: string, count: int] scala> unmatchDF.show() +---+-----+ | id|count| +---+-----+ | b| 2| | a| 1| | c| 3| +---+-----+

In PySpark you can achieve the same with subtract function.

Collectives™ on Stack Overflow

Filter operation in Spark Dataframe

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related