Maximizing Code Performance for Shapely

Question

I've written this code in order to calculate the percent cover of grassland within a national park, within 20km^2 of points of recorded occurrences for a species. It is designed to only consider the areas within the park, and not those outside. It works, except for one huge issue... it's incredibly slow. I think I've tracked it down to the avail_lc_type = pt_buffer.intersection(gl).area. The gl polygon has > 24,000 polygons in it. It is a raster to polygon transformation (it has been dissolved), so it has a lot of little polygons within. Right now it's not a huge problem since I'm running it on ~300 points (still takes > 1 hr), but I plan to run it on a few million later, so I need to improve it.

Any ideas?

import numpy as np import pprint import shapely from shapely.geometry import* import fiona from fiona import collection import math traps = fiona.open('some_points.shp', 'r') #point file: focal points around which statistics are being derived study_area = fiona.open('available_areas.shp', 'r') #polygon file: represents area available for analysis for i in study_area: #for every record in 'study_area' sa = shape(i['geometry']) #make a variable called 'sa' that is a polygon grassland = fiona.open('land_cover_type_of_interest.shp', 'r') #polygon file: want to calculate percent cover of this lc type within study_area, and within areaKM2 (next variable) of each focal point pol = grassland.next() gl = MultiPolygon([shape(pol['geometry']) for pol in grassland]) areaKM2 = 20 #hyp home range size of specie of interest with traps as input: #calculate initial area in meters, set radius areaM2 = areaKM2 * 1000000 r = (math.sqrt(areaM2/math.pi)) #begin buffering and calculating available area (i.e. within study area) for each point for point in input: pt_buffer = shape(point['geometry']).buffer(r) avail_area = pt_buffer.intersection(sa).area #check and adjust radius of buffer until it covers desired available area within study area while avail_area < areaM2: r += 300 pt_buffer = shape(point['geometry']).buffer(r) avail_area = pt_buffer.intersection(sa).area #then, calulate percent cover of land cover type of interest within adjusted buffer area #print to check avail_lc_type = pt_buffer.intersection(gl).area perc_cov = (avail_lc_type/areaM2) * 100 print perc_cov

Using the answer from @MWrenn I was able to profile my code and came up with this:

55.3555590078 415 function calls (365 primitive calls) in 48.633 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 48.632 48.632 <module1>:15(neighb_func) 1 0.000 0.000 48.633 48.633 <string>:1(<module>) 2 0.000 0.000 0.001 0.000 <string>:531(write) 1 0.000 0.000 0.000 0.000 __init__.py:1118(debug) 1 0.000 0.000 0.000 0.000 __init__.py:1318(getEffectiveLevel) 1 0.000 0.000 0.000 0.000 __init__.py:1332(isEnabledFor) 1 0.000 0.000 0.000 0.000 _abcoll.py:483(update) 3 0.000 0.000 0.000 0.000 _weakrefset.py:68(__contains__) 1 0.000 0.000 0.000 0.000 abc.py:128(__instancecheck__) 1 0.000 0.000 0.000 0.000 abc.py:148(__subclasscheck__) 11 0.000 0.000 0.000 0.000 base.py:195(_is_empty) 11 0.003 0.000 0.003 0.000 base.py:202(empty) 7 0.000 0.000 0.003 0.000 base.py:212(__del__) 25 0.000 0.000 0.000 0.000 base.py:231(_geom) 2 0.000 0.000 0.000 0.000 base.py:235(_geom) 3 0.000 0.000 0.000 0.000 base.py:313(geometryType) 3 0.000 0.000 0.000 0.000 base.py:316(type) 3 0.000 0.000 0.001 0.000 base.py:383(area) 2 0.000 0.000 0.001 0.000 base.py:443(buffer) 8 0.000 0.000 0.000 0.000 base.py:46(geometry_type_name) 5 0.000 0.000 0.000 0.000 base.py:52(geom_factory) 3 0.000 0.000 48.626 16.209 base.py:529(intersection) 10 0.000 0.000 0.000 0.000 brine.py:106(_dump_int) 2 0.000 0.000 0.000 0.000 brine.py:150(_dump_str) 12/2 0.000 0.000 0.000 0.000 brine.py:179(_dump_tuple) 24/2 0.000 0.000 0.000 0.000 brine.py:202(_dump) 2 0.000 0.000 0.000 0.000 brine.py:332(dump) 10/2 0.000 0.000 0.000 0.000 brine.py:360(dumpable) 14/8 0.000 0.000 0.000 0.000 brine.py:369(<genexpr>) 2 0.000 0.000 0.000 0.000 channel.py:56(send) 1 0.000 0.000 0.000 0.000 collection.py:186(filter) 1 0.000 0.000 0.000 0.000 collection.py:274(__iter__) 1 0.000 0.000 0.000 0.000 collection.py:364(closed) 1 0.000 0.000 0.000 0.000 collections.py:37(__init__) 25 0.000 0.000 0.000 0.000 collections.py:53(__setitem__) 4 0.000 0.000 0.000 0.000 compat.py:17(BYTES_LITERAL) 2 0.000 0.000 0.000 0.000 coords.py:20(required) 2 0.000 0.000 0.000 0.000 geo.py:20(shape) 5 0.000 0.000 0.000 0.000 geos.py:484(errcheck_predicate) 8 0.000 0.000 0.000 0.000 impl.py:43(__getitem__) 2 0.000 0.000 0.000 0.000 point.py:124(_set_coords) 2 0.000 0.000 0.000 0.000 point.py:188(geos_point_from_py) 2 0.000 0.000 0.000 0.000 point.py:37(__init__) 2 0.000 0.000 0.001 0.000 protocol.py:220(_send) 2 0.000 0.000 0.001 0.000 protocol.py:227(_send_request) 2 0.000 0.000 0.000 0.000 protocol.py:241(_box) 2 0.000 0.000 0.001 0.000 protocol.py:438(_async_request) 2 0.000 0.000 0.000 0.000 stream.py:173(write) 11 0.000 0.000 0.000 0.000 topology.py:14(_validate) 3 0.000 0.000 0.000 0.000 topology.py:33(__call__) 3 48.626 16.209 48.626 16.209 topology.py:40(__call__) 2 0.001 0.000 0.001 0.000 topology.py:57(__call__) 5 0.000 0.000 0.000 0.000 utf_8.py:15(decode) 5 0.000 0.000 0.000 0.000 {__import__} 5 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode} 3 0.000 0.000 0.000 0.000 {_ctypes.byref} 6/2 0.000 0.000 0.000 0.000 {all} 6 0.000 0.000 0.000 0.000 {getattr} 5 0.000 0.000 0.000 0.000 {globals} 10 0.000 0.000 0.000 0.000 {hasattr} 5 0.000 0.000 0.000 0.000 {isinstance} 31 0.000 0.000 0.000 0.000 {len} 5 0.000 0.000 0.000 0.000 {locals} 1 0.000 0.000 0.000 0.000 {math.sqrt} 2 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects} 24 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 28 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects} 5 0.000 0.000 0.000 0.000 {method 'pack' of 'Struct' objects} 2 0.000 0.000 0.000 0.000 {method 'release' of 'thread.lock' objects} 2 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects} 2 0.000 0.000 0.000 0.000 {next}

I'm still trying to figure out exactly what it means, but I think it's telling me that the intersection function is taking ~16 seconds each time it's called. So, per @gene 's suggestion, I'm working on using a spatial index. I just have to figure out how to get libspatialindex going so that I can use Rtrees.

Community · Accepted Answer · 2017-04-13 12:33:52Z

You need to use a spatial index. Without an spatial index, you must iterate through all the geometries. With a bounding spatial index, you iterate only through the geometries which have a chance to intersect the other geometries.

Popular bounding spatial indexes in Python:

R-tree index (Python modules Rtree or pyrtree)
Quadtree index (Quadtree module)

Examples:

In your script:

Why
- from fiona import collectionif you import fiona and use only fiona?
- import numpy as np if you are not using Numpy but math?
- import shapely if you specify from shapely.geometry import *?
If sa is a list, then use sa = [shape(i['geometry']) for i in for i in study_area]. If not, you need a condition; otherwise sa will be the last feature of study_area.
Why pol = grassland.next() if you read the shapefile with MultiPolygon([shape(pol['geometry']) for pol in grassland])?

Some suggestions:

You use the grassland and trap variables once, so you don't need them

gl = MultiPolygon([shape(pol['geometry']) for pol in fiona.open('land_cover_type_of_interest.shp')])

and

for point in fiona.open('some_points.shp'):

I'm new to Python, and as a result my ignorance shows. A lot of this script is repurposed from other online sources. Clearly my understanding of python leaves something to be desired. As for 1 - I was using collection in an earlier version of the script and left that in there. Will remove - ditto for numpy. I imported shapely because I planned to use parts other than geometry later on in the same module. 3&4 - I'm still learning exactly how fiona reads in files, so where I came across something that worked online, I left well enough alone. Lastly, I'll look into your suggestions. — CSB
– CSB, Commented Oct 27, 2014 at 20:30
I'm having trouble understanding the use of spatial indexes. If I create a spatial index for a polygons file, will it be "attached" to said file in some way, or will the index stand alone? That is, after creating a spatial index for my polygon file, can I just run Shapely's intersect() function on the polygon file again, and get better results? Or do I need to run it on the index, then relate those results back to my original polygon file? I've looked at the documentation, but it's not helping me here. — CSB
– CSB, Commented Nov 4, 2014 at 22:20

Community · Accepted Answer · 2017-05-23 12:39:39Z

When it comes to optimizing code, don't guess - profile https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script.

Just looking at your code, a similar statements :

avail_area = pt_buffer.intersection(sa).area

seems like it would get called even more than the statement you identified, since it's nested in yet another loop.

Also, you might look into prepared geometry with shapely - http://toblerity.org/shapely/manual.html#prepared-geometry-operations It supports limited geometric operations, so it may not be a drop in replacement for what you're looking to do.

Stack Exchange Network

Maximizing Code Performance for Shapely

2 Answers 2

Linked

Hot Network Questions

Maximizing Code Performance for Shapely

2 Answers 2

Linked

Related

Hot Network Questions