I want to use flatMap to realize the filter() + map() , like the following code: there are three if statement for outputting one Tuple2. otherwise will output an empty Array[Tuple2]
Do you have more elegant way to realize this function?
rddx.flatMap { case (arr: Array[String]) => val url_parts = arr(1).split("/") if (url_parts.length > 7) { val pid = url_parts(4) val lid = url_parts(7).split("_") if (lid.length == 2) { val sid = lid(0) val eid = lid(1) if (eid.length > 0 && eid(0) == "h") { Array((pid, 1)) } else new Array[(String, Int)](0) } else Array((pid, 1)) } else new Array[(String, Int)](0) }
collectavailable on anRDDbecause if so, that should be what you are looking for. Thecollectfunction in Scala is equivalent tomap+filter.collect, one that takes no args and one that takes aPartialFunction, I was referring to the later of the two. In looking at the Scala doc, that method seems to do what it would do on a normal scala collection. Are you surecollectis not what you want?