What is the right approach and clustering algorithm for geolocation clustering?
I'm using the following code to cluster geolocation coordinates:
import numpy as np import matplotlib.pyplot as plt from scipy.cluster.vq import kmeans2, whiten coordinates= np.array([ [lat, long], [lat, long], ... [lat, long] ]) x, y = kmeans2(whiten(coordinates), 3, iter = 20) plt.scatter(coordinates[:,0], coordinates[:,1], c=y); plt.show() Is it right to use K-means for geolocation clustering, as it uses Euclidean distance, and not Haversine formula as a distance function?
