4

I am trying to cluster latitudes and longitudes from a dataset containing locations in Germany (n=52612, about 9000 without duplicates). I am running the DBSCAN algorithm, but for none of the parameter values I have tried, I am getting any clusters- the cluster ID is 0 for all rows (I have also tried all coordinates vs. only unique ones). I did a similar exercise with Python's DBSCAN in sklearn and got some clusters there for a broad range of parameter values. What is wrong here?

eps = 20000 corresponds to 20km, is that correct? Anyway, I don't get any results irrespective of the eps-value

--add geometry ALTER TABLE table ADD COLUMN geom GEOMETRY; UPDATE table SET geom = (ST_SetSRID(ST_MakePoint(lng,lat),4326)) ; CREATE INDEX gix ON table USING GIST (geom); --cluster SELECT *, ST_ClusterDBSCAN(geom::geometry, eps := 20000, minpoints := 100) over () AS cluster_id FROM table WHERE year = 2010 and country = 'Germany' 

1 Answer 1

7

20000 is most probably in degrees - which is why all of your geometries are in cluster 0 (if it was failing to cluster they would be in NULL). You need to convert your data to be in metres by reprojecting into a local projection (SRID). For example EPSG:5243 would work, so something like:

ALTER TABLE table ADD COLUMN geom_m GEOMETRY; UPDATE table SET geom_m = ST_TRANSFORM(geom, 5243) ; CREATE INDEX gix_m ON table USING GIST (geom_m); 
1
  • i am quite a gis noob, but is there any projection that I can transform to, so that I can get a proper result for points that are spread all over the world? I use the data-type geography for my distance calculations (ST_distance) which works fine. But as ST_ClusterDBSCAN does not allow geometries, how can I archive that for a world-wide dataset? Commented Feb 21, 2022 at 11:21

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.