The below Scala method returns the k nearest neighbours of an Array :
def getNearestNeighbours(distances: Array[((String, String), Double)], k: Int, label: String) = { //| label: String)List[((String, String), Double)] distances.filter(v => v._1._1.equals(label) || v._1._2.equals(label)).sortBy(_._2).take(k) } I want to run this function in parallel. I can try converting the Array to an RDD but type RDD does not support the functions .sortBy(_._2).take(k) Is there a way to emulate this method in Spark/Scala ?
A possible solution is to modify the method so that the RDD is converted to an Array everytime the method is called, but I think this is computationally expensive for large RDD's ? :
def getNearestNeighbours(distances: RDD[((String, String), Double)], k: Int, label: String) = { //| label: String)List[((String, String), Double)] distances.collect.filter(v => v._1._1.equals(label) || v._1._2.equals(label)).sortBy(_._2).take(k) }