I am trying to efficiently code the computation of the KDN complexity measure, which involves a loop over all of the rows of a distance matrix and making some computations out of it.
I am trying to parallel this code with foreach and %dopar% functions, but I am not achieving any running time reduction. I am conscious that some parallel computations are not efficient because of memory management, but I don’t know if this is my case or if I am doing something wrong.
This is a reproducible example with digits data from rsvd package:
First, I call all the necessary packages, I read the digits data and then I get some useful information.
####################### ### NEEDED PACKAGES ### ####################### library(dplyr) library(parallelDist) # for Parallel Processing library(doParallel) library(foreach) # for digits data library(rsvd) ############# ### DATA ### ############# data(digits) data = as.data.frame(digits) # Dividing on X variables and Y target dataX = data %>% dplyr::select(-label) dataY = data %>% mutate(label=factor(label)) %>% pull(label) ## number of data n=dim(dataX)[1] Then, I do some necessary computations prior to the KDN loop I want to efficiently parallel.
############################## ### PREVIOUS COMPUTATIONS ### ############################## ## number of available data in each class n_data_classes=table(factor(dataY)) ## number of data to be considered as neighbours in each class k=0.05 k_neighbours_classes=ceiling(n_data_classes*k) ## DISTANCE MATRIX COMPUTATION # this is time consuming but I'm not concerned about this distance_matrix=as.matrix(parDist(scale(dataX))) The KDN computation without no parallelization is the next one and it takes 12 secs.
######################################### ### COMPUTING KDN: NO PARALLELIZATION ### ######################################### ## KDN instance level computation # inicialization of a vector to store KDN instance level values kdn_instance=numeric(n) system.time( for (ix in 1:n){ ## Gettig the class of ix data point class_ix=dataY[ix] ## number of data to be considered as neighbours in this class k_value=k_neighbours_classes[class_ix] # we get the k_value nearest neighbors set of ix distances_ix=distance_matrix[ix,] distances_ix_ordered=order(distances_ix,decreasing = F) knn_set_ix=distances_ix_ordered[2:(k_value+1)] # Y value of the k_neighbors_set_ix Y_value_knn_set_ix=dataY[knn_set_ix] # Y value of ix data Y_value_ix=dataY[ix] # number of data in knn_set_ix with different Y value that ix knn_set_ix_different_Y_value=length(Y_value_knn_set_ix[Y_value_knn_set_ix!=Y_value_ix]) kdn_instance[ix]=knn_set_ix_different_Y_value/k_value } ) # user system elapsed # 12.29 0.37 12.67 secs My attempt to parallel that loop is the following one, using foreach and %dopar%, which takes 35 secs.
###################################### ### COMPUTING KDN: PARALLELIZATION ### ###################################### ## Preparing for paralleling # number of cores to use n.cores <- parallel::detectCores() - 1 # we define the cluster and register it so it can be used by %dopar% my.cluster <- parallel::makeCluster(n.cores,type = "PSOCK") # register it to be used by %dopar% doParallel::registerDoParallel(cl = my.cluster) ## KDN instance level computation kdn_instance= NULL #iterator itx <- iter(distance_matrix, by = 'row') system.time( kdn_instance <- foreach( ix = itx, .combine = 'c' ) %dopar% { ## Gettig the class of ix data point class_ix=dataY[ix] ## number of data to be considered as neighbours in this class k_value=k_neighbours_classes[class_ix] # we get the k_value nearest neighbors set of ix distances_ix=distance_matrix[ix,] distances_ix_ordered=order(distances_ix,decreasing = F) knn_set_ix=distances_ix_ordered[2:(k_value+1)] # Y value of the k_neighbors_set_ix Y_value_knn_set_ix=dataY[knn_set_ix] # Y value of ix data Y_value_ix=dataY[ix] # number of data in knn_set_ix with different Y value that ix knn_set_ix_different_Y_value=length(Y_value_knn_set_ix[Y_value_knn_set_ix!=Y_value_ix]) knn_set_ix_different_Y_value/k_value} ) parallel::stopCluster(cl = my.cluster) # user system elapsed # 12.38 4.64 35.14 secs As can be seen, the parallel computation is taking more time than the not parallelized one.
My question is: is there something wrong with the parallel processing code? Is there a better way to do it? Maybe it should be done with another package.