I plot elbow method to find appropriate number of KMean cluster when I am using Python and sklearn. I want to do the same when I'm working in PySpark. I am aware that PySpark has limited functionality due to the Spark's distributed nature, but, is there a way to get this number?
I am using the following code to plot the elbow Using the Elbow method to find the optimal number of clusters from sklearn.cluster import KMeans
wcss = [] for i in range(1, 11): kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0) kmeans.fit(X) wcss.append(kmeans.inertia_) plt.plot(range(1, 11), wcss) plt.title('The Elbow Method') plt.xlabel('Number of clusters') plt.ylabel('WCSS') plt.show() 