I have a simple example
val arrayStructureData = Seq( Row("test1|value1"), Row("test2|value2"), Row("test3|value3") ) val arrayStructureSchema = new StructType() .add("name", StringType) val df = spark.createDataFrame( spark.sparkContext.parallelize(arrayStructureData), arrayStructureSchema) import spark.implicits._ val distPhens = df.flatMap(row => row.getString(0).split("\\|")) .filter(x => x.like("test[0-9]+")) .toDF("distinct_phens") where I'm trying to run filter after running flatMap. The desired output is :
value1 value2 value3 If I understand correctly, like expects a column but I am not sure how to "refer" to the column after flatMap has been executed.
I need this filter operation to run after flatMap.
Thanks in advance.