I'm trying to perform a hierarchical clustering, to aggregate some "zones" or neighborhoods of a city, based on the language that is used most in that zone
In order to do so, I have at hand a dataset provided by Twitter that gives me the coordinates (long and lat) where a tweet was posted and the language in which it is written
So I need an operative definition of distance between two languages!
Any ideas?
EDIT I actually found this:
https://alternativetransport.wordpress.com/2015/05/05/34/
It just doesn't come with a numerical table of distances, but I'll try to ask for one :)