Only read specific rows of a shapefile with GeoPandas / Fiona

Question

I would like to read in only certain rows of a shapefile using GeoPandas. I've read approach for limiting columns of shapefile to read in, which I may use as well (Only read specific attribute columns of a shapefile with Geopandas / Fiona), but was looking for help translating this approach to limit the rows of shapefile being read in.

I think the answer is a simple modification of GeoDataFrame.from_features and a function using Fiona in linked questions on limiting columns of shapefile to read into GeoPandas.

Community · Accepted Answer · 2017-05-23 12:39:42Z

If you don't want to read the whole file, another solution is to use a generator as in Only read specific attribute columns of a shapefile with Geopandas / Fiona

As a reminder (from What does the “yield” keyword do?):

Everything you can use "for... in..." on is an iterable: lists, strings, files... but you store all the values in memory
Generators are iterables, but you can only read them once. It's because they do not store all the values in memory, they generate the values on the fly

When you use Fiona to read a shapefile, the result is a generator and not a simple list and with the list of rows [0,4,7], we don't need to read all the records of the shapefile but only until the last element in the list.

The generator

def records(filename, list): list = sorted(list) # if the elements of the list are not sorted with fiona.open(filename) as source: for i, feature in enumerate(source[:max(list)+1]): if i in list: yield feature gpd.GeoDataFrame.from_features(records("test.shp", [4,0,7]))

Result

It is also possible to adapt the solution of rick debbout by converting the list comprehension (will create the entire list in memory first) into generator expression (will create the items on the fly)

def getRows2(fn, idxList): reader = fiona.open(fn) return gpd.GeoDataFrame.from_features((reader[x] for x in idxList))

And if you want to extract a continuous slice, 8 to 12 for example, it is easier

c = fiona.open('test.shp') gpd.GeoDataFrame.from_features(c[8:13])

rickD · Accepted Answer · 2016-12-03 05:28:22Z

Using the indexes you could:

import fiona import geopandas as gpd def getRows(fn, idxList): reader = fiona.open(fn) return gpd.GeoDataFrame.from_features([reader[x] for x in idxList]) keepIndexes = [0,4,7] # list of indexes from shp file filename = './path/to/filename' outDF = getRows(filename, keepIndexes)

I think if you were to use values from a column to select, like you linked to above, you would have to read through the whole file to get them anyway, so, no difference in read-in time.

Stack Exchange Network

Only read specific rows of a shapefile with GeoPandas / Fiona

2 Answers 2

Linked

Hot Network Questions

Only read specific rows of a shapefile with GeoPandas / Fiona

2 Answers 2

Linked

Related

Hot Network Questions