I have a very large dataset contains over 700 million points and a polygon dataset as a buffer zone.
My task is to extract all points inside the buffer zone and create a new table.
Below is my code. I test it with a small point dataset and it works fine.
create table schema1.result as select point.* from schema1.site as point, schema2.buffer as poly Where ST_Intersects(point.geo_loc,poly.wkb_geometry); Unfortunately, the query lasted for 1 day and showed no signs to finish.
Is there any advice to optimise my code to speed up the query?
Update: The output of Explain
"Nested Loop (cost=0.41..17773703.88 rows=6789472 width=208)"
" -> Seq Scan on buffer poly (cost=0.00..18.50 rows=850 width=32)"
" -> Index Scan using idx_site on site point (cost=0.41..20902.23 rows=799 width=208)"
" Index Cond: (geo_loc && poly.wkb_geometry)"
" Filter: st_intersects(geo_loc, poly.wkb_geometry)"
"JIT:"
" Functions: 6"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
point.geo_locand/orpoly.wkb_geometry? Could you show the output of explain with the query?ANALYZEd the tables after creating the indices? Also, someone else may correct me, but you could addAND ST_DWITHIN(point.geo_loc,poly.wkb_geometry, 1)to the where and it may use the spatial indices more efficiently/at all.ST_NPointsof your polygons? How many are there? Do they all have a regular shape?JOINwill fetch it for each match; you'd need to add aDISTINCTwhich will be significantly slower. In that case, anEXISTSmay indeed be the better choice. If, however, a point always only intersects with one polygon, aJOINis likely the better plan.