I am trying to debug a problem and am not sure if it is my code or if there's a bug with shapely.
I have two geodataframes: one with points and one with lines. For each point in the points geodataframe, I am trying to find the closest line to this point, and then get the closest projected point interpolated along the closest line.
In other words, give me the closest point on the closest line to my point, but not necessarily an existing vertex on the line.
points = #.. geodataframe of Points lines = #.. geodataframe of LineStrings nearest_points = [] for point in points['geometry']: nearest_line = lines.loc[lines.distance(point).idxmin()]['geometry'] nearest_point = nearest_line.interpolate(nearest_line.project(point)) nearest_points.append(nearest_point) gdf = gpd.GeoDataFrame({'geometry': nearest_points}, crs=points.crs) I would imagine that taking the intersection of the resulting closest point geodataframe against the lines geodataframe would yield the closest points created in the previous step since they are supposed to be points interpolated along the closest line.
However:
len(geopandas.overlay(gdf, lines, how='intersection', keep_geom_type = False)) does NOT EQUAL:
len(lines) When I inspect the data in QGIS, I can see that the points do not fall exactly on the lines.
Is this a shapely bug?
UPDATE:
Here's a picture. Basically, the little orange points should lie exactly on the line since, but they are offset by a tiny fraction of an amount.
What I am trying to do is add nodes to a network of lines so the points have to be exactly on the line. The next step will be to split the lines at the points, but I cannot progress with the points as they are and snapping won't work because shapely snaps to the nearest vertex.
UPDATE 2:
@BERA's solution seems like it should have worked, but intersecting the resulting lines with the original lines should have picked up all the end points and did not.
I wrote something that does work, but it required snapping the closest line to the nearest point on that line and updating the line's geometry to actually work.
Finally, I tried comparing the nearest_point and spatial join solution output geometries with the geometries of the output from the snapped solution and there seem to be unequal geometries, but when they are printed, they look identical. I'm pretty stumped.
def create_connecting_lines(pointsdf, linesdf): connection_lines = [] def _nearest_line(point, lines): return lines.loc[lines.geometry.distance(point).idxmin()] def _segments(curve): segs = list(map(LineString, zip(curve.coords[:-1], curve.coords[1:]))) return geopandas.GeoDataFrame(segs, columns=["geometry"], crs=linesdf.crs) def _nearest_point(point, line): return line.interpolate(line.project(point)) def _snap_line_to_point(line, point): return snap(line, point, 0.0001) for index, point in pointsdf.iterrows(): # get the closest line line = _nearest_line(point.geometry, linesdf) if len(line.geometry.coords) > 2: # split into individual segments segs = _segments(line.geometry) # get the nearest segment _line = _nearest_line(point.geometry, segs) # get the nearest point on segment nearest_point = _nearest_point(point.geometry, _line.geometry) # snap the line to the point _line = _snap_line_to_point(line.geometry, nearest_point) # reset the line geometry to the snapped geometry linesdf.at[line.name, "geometry"] = _line # create connecting line connection_line = LineString([point.geometry, nearest_point]) connection_lines.append(connection_line) else: nearest_point = _nearest_point(point.geometry, line.geometry) # snap the line to the point _line = _snap_line_to_point(line.geometry, nearest_point) # reset the line geometry to the snapped geometry linesdf.at[line.name, "geometry"] = _line # create connecting line connection_line = LineString([point.geometry, nearest_point]) connection_lines.append(connection_line) the_lines = gpd.GeoDataFrame({"geometry": connection_lines}, crs=linesdf.crs) the_lines["length"] = the_lines.geometry.length return the_lines SOLUTION 1: nearest_points
linedf["linegeom"] = linedf.geometry sj = gpd.sjoin_nearest(pointdf, linedf[["linegeom", "geometry"]], how="left") sj["nearest_points"] = sj.apply(lambda row: nearest_points(g1=row["geometry"], g2=row["linegeom"]), axis=1) sj["shortest_line"] = sj.apply(lambda row: LineString(row["nearest_points"]), axis=1) sj = sj.set_geometry("shortest_line").set_crs(linedf.crs).drop(columns="geometry").rename_geometry("geometry") SOLUTION 2: snapping
the_lines = create_connecting_lines(pointdf, linedf) COMPARISON:
print(all(the_lines.geometry == sj.geometry)) # false print(len(pointdf)) # 757 print(len(sj) == len(pointdf)) # true print(len(gpd.overlay(the_lines, linedf, 'intersection', keep_geom_type=False))) # there are 757 intersecting points - this is correct print(len(gpd.overlay(sj, linedf, 'intersection', keep_geom_type=False))) # there are 747 intersecting points, not correct # get the index where geometries don't match idx = the_lines.loc[the_lines.geometry != sj.geometry].index print(the_lines.loc[the_lines.geometry != sj.geometry][:1].geometry.values) <GeometryArray> [<LINESTRING (3043280.994 1619771.3, 3043307.474 1619792.514)>] Length: 1, dtype: geometry print(sj.loc[idx][:1].geometry.values) <GeometryArray> [<LINESTRING (3043280.994 1619771.3, 3043307.474 1619792.514)>] Length: 1, dtype: geometry What is going on here?

