2

I have a GeoPackage with two polygon layers (WGS84). The first layer is an Sentinel 2 tile grid with about 800 polygons. The other layer is about 300 000 small polygons (AOIs). Each layer has an rtree spatial index.

Tables in the gpkg:

[('gpkg_spatial_ref_sys',), ('gpkg_contents',), ('gpkg_ogr_contents',), ('gpkg_geometry_columns',), ('gpkg_tile_matrix_set',), ('gpkg_tile_matrix',), ('aois',), ('gpkg_extensions',), ('rtree_aois_geom',), ('rtree_aois_geom_rowid',), ('rtree_aois_geom_node',), ('rtree_aois_geom_parent',), ('sentinel2_tiles',), ('rtree_sentinel2_tiles_geom',), ('rtree_sentinel2_tiles_geom_rowid',), ('rtree_sentinel2_tiles_geom_node',), ('rtree_sentinel2_tiles_geom_parent',)] 

I would like to check which AOIs are within which Sentinel 2 grid tile (I know about the overlaps between sentinel 2 tiles but it doesn't matter in this case as long as each AOI is assigned a tile ID). The problem I face is that that the query takes a very very long time to run. Doing geometry queries with a single layer is fairly quick (2 sec or less on average).

Currently this is the query:

SELECT a.fid, b.grid FROM aois a, sentinel2_tiles b WHERE ST_Within(ST_envelope(a.geom), ST_envelope(b.geom)); 

How can I increase/optimize the query? Alternatively, would it be better/faster to use the OGR Within() function?

1 Answer 1

3

For the moment getting the unique grid tile names first and then multiprocessing a within query for each grid tile seems to work. It reduced the query time to under a minute. the first query which return unique grid tile IDs (list called results):

"""SELECT DISTINCT b.grid FROM sentinel2_tiles b;""" 

Second query (list of queries which is then multiprocessed):

query_list = [f"""SELECT a.fid, b.grid FROM aois a, sentinel2_tiles b WHERE b.name = '{tile}' AND ST_Within(ST_envelope(a.geom), ST_envelope(b.geom));""" for tile in results] 

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.