9

I've written this code in order to calculate the percent cover of grassland within a national park, within 20km^2 of points of recorded occurrences for a species. It is designed to only consider the areas within the park, and not those outside. It works, except for one huge issue... it's incredibly slow. I think I've tracked it down to the avail_lc_type = pt_buffer.intersection(gl).area. The gl polygon has > 24,000 polygons in it. It is a raster to polygon transformation (it has been dissolved), so it has a lot of little polygons within. Right now it's not a huge problem since I'm running it on ~300 points (still takes > 1 hr), but I plan to run it on a few million later, so I need to improve it.

Any ideas?

import numpy as np import pprint import shapely from shapely.geometry import* import fiona from fiona import collection import math traps = fiona.open('some_points.shp', 'r') #point file: focal points around which statistics are being derived study_area = fiona.open('available_areas.shp', 'r') #polygon file: represents area available for analysis for i in study_area: #for every record in 'study_area' sa = shape(i['geometry']) #make a variable called 'sa' that is a polygon grassland = fiona.open('land_cover_type_of_interest.shp', 'r') #polygon file: want to calculate percent cover of this lc type within study_area, and within areaKM2 (next variable) of each focal point pol = grassland.next() gl = MultiPolygon([shape(pol['geometry']) for pol in grassland]) areaKM2 = 20 #hyp home range size of specie of interest with traps as input: #calculate initial area in meters, set radius areaM2 = areaKM2 * 1000000 r = (math.sqrt(areaM2/math.pi)) #begin buffering and calculating available area (i.e. within study area) for each point for point in input: pt_buffer = shape(point['geometry']).buffer(r) avail_area = pt_buffer.intersection(sa).area #check and adjust radius of buffer until it covers desired available area within study area while avail_area < areaM2: r += 300 pt_buffer = shape(point['geometry']).buffer(r) avail_area = pt_buffer.intersection(sa).area #then, calulate percent cover of land cover type of interest within adjusted buffer area #print to check avail_lc_type = pt_buffer.intersection(gl).area perc_cov = (avail_lc_type/areaM2) * 100 print perc_cov 

Using the answer from @MWrenn I was able to profile my code and came up with this:

55.3555590078 415 function calls (365 primitive calls) in 48.633 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 48.632 48.632 <module1>:15(neighb_func) 1 0.000 0.000 48.633 48.633 <string>:1(<module>) 2 0.000 0.000 0.001 0.000 <string>:531(write) 1 0.000 0.000 0.000 0.000 __init__.py:1118(debug) 1 0.000 0.000 0.000 0.000 __init__.py:1318(getEffectiveLevel) 1 0.000 0.000 0.000 0.000 __init__.py:1332(isEnabledFor) 1 0.000 0.000 0.000 0.000 _abcoll.py:483(update) 3 0.000 0.000 0.000 0.000 _weakrefset.py:68(__contains__) 1 0.000 0.000 0.000 0.000 abc.py:128(__instancecheck__) 1 0.000 0.000 0.000 0.000 abc.py:148(__subclasscheck__) 11 0.000 0.000 0.000 0.000 base.py:195(_is_empty) 11 0.003 0.000 0.003 0.000 base.py:202(empty) 7 0.000 0.000 0.003 0.000 base.py:212(__del__) 25 0.000 0.000 0.000 0.000 base.py:231(_geom) 2 0.000 0.000 0.000 0.000 base.py:235(_geom) 3 0.000 0.000 0.000 0.000 base.py:313(geometryType) 3 0.000 0.000 0.000 0.000 base.py:316(type) 3 0.000 0.000 0.001 0.000 base.py:383(area) 2 0.000 0.000 0.001 0.000 base.py:443(buffer) 8 0.000 0.000 0.000 0.000 base.py:46(geometry_type_name) 5 0.000 0.000 0.000 0.000 base.py:52(geom_factory) 3 0.000 0.000 48.626 16.209 base.py:529(intersection) 10 0.000 0.000 0.000 0.000 brine.py:106(_dump_int) 2 0.000 0.000 0.000 0.000 brine.py:150(_dump_str) 12/2 0.000 0.000 0.000 0.000 brine.py:179(_dump_tuple) 24/2 0.000 0.000 0.000 0.000 brine.py:202(_dump) 2 0.000 0.000 0.000 0.000 brine.py:332(dump) 10/2 0.000 0.000 0.000 0.000 brine.py:360(dumpable) 14/8 0.000 0.000 0.000 0.000 brine.py:369(<genexpr>) 2 0.000 0.000 0.000 0.000 channel.py:56(send) 1 0.000 0.000 0.000 0.000 collection.py:186(filter) 1 0.000 0.000 0.000 0.000 collection.py:274(__iter__) 1 0.000 0.000 0.000 0.000 collection.py:364(closed) 1 0.000 0.000 0.000 0.000 collections.py:37(__init__) 25 0.000 0.000 0.000 0.000 collections.py:53(__setitem__) 4 0.000 0.000 0.000 0.000 compat.py:17(BYTES_LITERAL) 2 0.000 0.000 0.000 0.000 coords.py:20(required) 2 0.000 0.000 0.000 0.000 geo.py:20(shape) 5 0.000 0.000 0.000 0.000 geos.py:484(errcheck_predicate) 8 0.000 0.000 0.000 0.000 impl.py:43(__getitem__) 2 0.000 0.000 0.000 0.000 point.py:124(_set_coords) 2 0.000 0.000 0.000 0.000 point.py:188(geos_point_from_py) 2 0.000 0.000 0.000 0.000 point.py:37(__init__) 2 0.000 0.000 0.001 0.000 protocol.py:220(_send) 2 0.000 0.000 0.001 0.000 protocol.py:227(_send_request) 2 0.000 0.000 0.000 0.000 protocol.py:241(_box) 2 0.000 0.000 0.001 0.000 protocol.py:438(_async_request) 2 0.000 0.000 0.000 0.000 stream.py:173(write) 11 0.000 0.000 0.000 0.000 topology.py:14(_validate) 3 0.000 0.000 0.000 0.000 topology.py:33(__call__) 3 48.626 16.209 48.626 16.209 topology.py:40(__call__) 2 0.001 0.000 0.001 0.000 topology.py:57(__call__) 5 0.000 0.000 0.000 0.000 utf_8.py:15(decode) 5 0.000 0.000 0.000 0.000 {__import__} 5 0.000 0.000 0.000 0.000 {_codecs.utf_8_decode} 3 0.000 0.000 0.000 0.000 {_ctypes.byref} 6/2 0.000 0.000 0.000 0.000 {all} 6 0.000 0.000 0.000 0.000 {getattr} 5 0.000 0.000 0.000 0.000 {globals} 10 0.000 0.000 0.000 0.000 {hasattr} 5 0.000 0.000 0.000 0.000 {isinstance} 31 0.000 0.000 0.000 0.000 {len} 5 0.000 0.000 0.000 0.000 {locals} 1 0.000 0.000 0.000 0.000 {math.sqrt} 2 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects} 24 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 28 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects} 5 0.000 0.000 0.000 0.000 {method 'pack' of 'Struct' objects} 2 0.000 0.000 0.000 0.000 {method 'release' of 'thread.lock' objects} 2 0.000 0.000 0.000 0.000 {method 'send' of '_socket.socket' objects} 2 0.000 0.000 0.000 0.000 {next} 

I'm still trying to figure out exactly what it means, but I think it's telling me that the intersection function is taking ~16 seconds each time it's called. So, per @gene 's suggestion, I'm working on using a spatial index. I just have to figure out how to get libspatialindex going so that I can use Rtrees.

2 Answers 2

12

You need to use a spatial index. Without an spatial index, you must iterate through all the geometries. With a bounding spatial index, you iterate only through the geometries which have a chance to intersect the other geometries.

Popular bounding spatial indexes in Python:

Examples:

In your script:

  1. Why
    • from fiona import collectionif you import fiona and use only fiona?
    • import numpy as np if you are not using Numpy but math?
    • import shapely if you specify from shapely.geometry import *?
  2. If sa is a list, then use sa = [shape(i['geometry']) for i in for i in study_area]. If not, you need a condition; otherwise sa will be the last feature of study_area.
  3. Why pol = grassland.next() if you read the shapefile with MultiPolygon([shape(pol['geometry']) for pol in grassland])?

Some suggestions:

You use the grassland and trap variables once, so you don't need them

gl = MultiPolygon([shape(pol['geometry']) for pol in fiona.open('land_cover_type_of_interest.shp')]) 

and

for point in fiona.open('some_points.shp'): 
2
  • I'm new to Python, and as a result my ignorance shows. A lot of this script is repurposed from other online sources. Clearly my understanding of python leaves something to be desired. As for 1 - I was using collection in an earlier version of the script and left that in there. Will remove - ditto for numpy. I imported shapely because I planned to use parts other than geometry later on in the same module. 3&4 - I'm still learning exactly how fiona reads in files, so where I came across something that worked online, I left well enough alone. Lastly, I'll look into your suggestions. Commented Oct 27, 2014 at 20:30
  • I'm having trouble understanding the use of spatial indexes. If I create a spatial index for a polygons file, will it be "attached" to said file in some way, or will the index stand alone? That is, after creating a spatial index for my polygon file, can I just run Shapely's intersect() function on the polygon file again, and get better results? Or do I need to run it on the index, then relate those results back to my original polygon file? I've looked at the documentation, but it's not helping me here. Commented Nov 4, 2014 at 22:20
5

When it comes to optimizing code, don't guess - profile https://stackoverflow.com/questions/582336/how-can-you-profile-a-python-script.

Just looking at your code, a similar statements :

avail_area = pt_buffer.intersection(sa).area 

seems like it would get called even more than the statement you identified, since it's nested in yet another loop.

Also, you might look into prepared geometry with shapely - http://toblerity.org/shapely/manual.html#prepared-geometry-operations It supports limited geometric operations, so it may not be a drop in replacement for what you're looking to do.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.