0

I have a simple example

val arrayStructureData = Seq( Row("test1|value1"), Row("test2|value2"), Row("test3|value3") ) val arrayStructureSchema = new StructType() .add("name", StringType) val df = spark.createDataFrame( spark.sparkContext.parallelize(arrayStructureData), arrayStructureSchema) import spark.implicits._ val distPhens = df.flatMap(row => row.getString(0).split("\\|")) .filter(x => x.like("test[0-9]+")) .toDF("distinct_phens") 

where I'm trying to run filter after running flatMap. The desired output is :

value1 value2 value3 

If I understand correctly, like expects a column but I am not sure how to "refer" to the column after flatMap has been executed.

I need this filter operation to run after flatMap.

Thanks in advance.

1 Answer 1

2

You can refer to the column object using col, and do an rlike filter:

val result = df.flatMap(row => row.getString(0).split("\\|")).filter(col("value").rlike("test[0-9]+")) result.show +-----+ |value| +-----+ |test1| |test2| |test3| +-----+ 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.