I have a dataframe with coordinates and other attributes, and a shp file (the whole package with shx and dbf as well) with many polygons of neighborhoods. I need to match each point to which polygon it belongs to, and it would be for quite a large dataset.
What I did find and why it didn't fit
I've found some suggested solutions online for it, and understood the best ways should be by using Fiona and Shapely, or GeoPandas, but:
1. Existing solutions (that I found) were all assuming I'm working with an shp file for the points as well, but I'm working with a dataframe. (I may be able to find out how to convert it to an shp file but I assume that will be redundant and not the most efficient).
2. Most solutions assumed the shp file has only one attribute, while mine has many attributes; or aimed at checking if points are in 1 polygon or not. I assume there must be a better way to assign points to the corresponding polygon than looping through this.
Solutions I tried
I tried suggestions from the following pages:
- all of the solutions in the extensive answer here: Check if a point falls within a multipolygon with Python and these:
- Find csv lat and long points in a shapefile polygon with geopandas spatial index
- Fastest way to join many points to many polygons in python
- More Efficient Spatial join in Python without QGIS, ArcGIS, PostGIS, etc
- https://automating-gis-processes.github.io/2016/Lesson3-spatial-join.html
Attempt 1: Fiona + Shapely
from How to map a point to a polygon label from a shapefile:
import fiona from shapely.geometry import Point, shape NY_nbr_shpfile = 'taxi_zones.shp' def coor_to_nbr(longit, lat, shape_file): mypoint = Point(longit, lat) with fiona.open(NY_nbr_shpfile) as shp: polygons = [poly for poly in shp] poly_idx = [i for i, poly in enumerate(polygons) if mypoint.within(shape(poly['geometry']))] if poly_idx: print poly_idx if not poly_idx: return None else: # Take first polygon that overlaps since may overlap with several if on border match = polygons[poly_idx[0]] return match['properties'] ### TRYING TO MATCH one-by-one, that didn't work: print coor_to_nbr(-73.9868805930018, 40.7697167683218, NY_nbr_shpfile) print coor_to_nbr( 40.722249, -73.997673, NY_nbr_shpfile) . ### just checking it's not opposite print coor_to_nbr( -73.977673,40.722249, NY_nbr_shpfile)` This printed 'None' to everything I tried, but I know these points are inside one of the polygons.
Attempt 2: GeoPandas1
This is taken from the following url; I left the original points there just to see if it would even work, and it seems to not work for me - yielding at ValueError: need at least one array to concatenate.
#### https://gis.stackexchange.com/questions/190903/assign-a-point-to-polygon-using-pandas-and-shapely #### ny_nbr_shpfile = '/Users/tomer/Dropbox (Via)/Mapping data/New York/TLC Taxi Zones/taxi_zones (1)/taxi_zones.shp' import pandas import geopandas import geopandas.tools from shapely.geometry import Point #### POLYGONS # Load the polygons polys = geopandas.GeoDataFrame.from_file(ny_nbr_shpfile) #### POINTS # Create a DataFrame with some cities, including their location raw_data = [ ("London", 51.5, -0.1), ("Paris", 48.9, 2.4), ("San Francisco", 37.8, -122.4), ] points = pandas.DataFrame(raw_data, columns=["name", "latitude", "longitude"]) ###print points # Create the geometry column from the coordinates # Remember that longitude is east-west (i.e. X) and latitude is north-south (i.e. Y) points["geometry"] = points.apply(lambda row: Point(row["longitude"], row["latitude"]), axis=1) del(points["latitude"], points["longitude"]) ###print points # Convert to a GeoDataFrame points = geopandas.GeoDataFrame(points, geometry="geometry") # Declare the coordinate system for the points GeoDataFrame # GeoPandas doesn't do any transformations automatically when performing # the spatial join. The layers are already in the same CRS (WGS84) so no # transformation is needed. points.crs = polys.crs ###print points.crs #print nbrhoods # Drop all columns except the name and polygon geometry #boroughs = nbrhoods[["borough", "geometry"]] # Perform the spatial join result = geopandas.tools.sjoin(points, nbrhoods, how="left") # Print the results... print(result.head()) result: ValueError: need at least one array to concatenate. From researching this error, it seems that it happenned commonly when spatial join yields no results but it was supposed to be fixed by this version of GeoPandas. I assume it is fixed and it is another problem here.
Attempt 3: GeoPandas 2 : (similar)
import pandas as pd from shapely.geometry import Point import geopandas as gp from geopandas.tools import sjoin FRC1 = gp.read_file(ny_nbr_shpfile) #trip = pd.read_csv('TripRecordsReporttrips.csv', sep=',',header=None, usecols=[0, 4, 8, 9, 10, 11],names=['TripID', 'Date', 'StartLat', 'StartLon','EndLat','EndLon']) geometry = [Point(-73.9868805930018, 40.7697167683218), Point(-73.977673,40.722249)] #geometry = [Point(xy) for xy in zip(trip['StartLon'], trip['StartLat'])] #geometry2 = [Point(xy) for xy in zip(trip['EndLon'], trip['EndLat'])] #trip = trip.drop(['StartLon', 'StartLat','EndLon','EndLat'], axis=1) trip = ['broadway 150', 'crosby 10'] crs = {'init' :'epsg:4326'} starts = gp.GeoDataFrame(trip, crs=FRC1.crs , geometry=geometry) #ends = gp.GeoDataFrame(trip, crs=crs, geometry=geometry2) #starts.head() ###print pointInPolys = sjoin(starts, FRC1, how='left',op="within") Reproduction Materials
Shapefiles for neighborhoods is the same as here: https://geo.nyu.edu/catalog/nyu_2451_36743
Example input for points:
Start Lng................... Start Lat.................. End Lat ................... End Long................
-73.9446545764804 40.7796840845172 40.7693589967192 -73.9857926219702 -73.9574825763702 40.8116657733964 40.7834276940059 -73.9789965748787 -73.9574825763702 40.8116657733964 40.7839981278416 -73.9762181416154 -73.9938855171204 40.7355130703031 40.7112190324095 -73.9506489783525 -73.9634789898992 40.7588519932381 40.7544933950398 -73.9975795894861 -73.9907560497522 40.7364947075543 40.804403082633 -73.9372211694717 -73.9907560497522 40.7364947075543 40.804403082633 -73.9372211694717 -73.9939170330763 40.7464936766106 40.7191759823283 -73.9889881387353 -73.9907560497522 40.7364947075543 40.804403082633 -73.9372211694717 -73.9831633865833 40.718557218786 40.6432838990509 -73.7899511307478 -73.9887668564916 40.7186952022837 40.7759590522127 -73.9807621389627
*Points put into an array just for experiments - even through half of them are lat/long and half are long/lat, I'm trying both to see if the problem might be that the functions take it as one way and not the other.
[(-73.9446545764804, 40.7796840845172) (40.7693589967192 ,-73.9857926219702) (-73.9574825763702, 40.8116657733964 ) (40.7834276940059 ,-73.9789965748787 ) (-73.9574825763702, 40.8116657733964 ) (40.7839981278416 ,-73.9762181416154 ) (-73.9938855171204, 40.7355130703031 ) (40.7112190324095 ,-73.9506489783525 ) (-73.9634789898992, 40.7588519932381 ) (40.7544933950398 ,-73.9975795894861 ) (-73.9907560497522, 40.7364947075543 ) (40.804403082633 ,-73.9372211694717 ) (-73.9907560497522, 40.7364947075543 ) (40.804403082633 ,-73.9372211694717 ) (-73.9939170330763, 40.7464936766106 ) (40.7191759823283 ,-73.9889881387353 ) (-73.9907560497522, 40.7364947075543 ) (40.804403082633 ,-73.9372211694717 ) (-73.9831633865833, 40.718557218786 ) (40.6432838990509 ,-73.7899511307478 ) (-73.9887668564916, 40.7186952022837 ) (40.7759590522127, -73.9807621389627)]