I'm writing filter function for complex JSON dataset with lot's of inner structures. Passing individual columns is too cumbersome.
So I declared the following UDF:
val records:DataFrame = = sqlContext.jsonFile("...") def myFilterFunction(r:Row):Boolean=??? sqlc.udf.register("myFilter", (r:Row)=>myFilterFunction(r)) Intuitively I'm thinking it will work like this:
records.filter("myFilter(*)=true") What is the actual syntax?
Rowthrows away a lot of optimizations aDataFramedoes for you.