2

I would like to read in only certain rows of a shapefile using GeoPandas. I've read approach for limiting columns of shapefile to read in, which I may use as well (Only read specific attribute columns of a shapefile with Geopandas / Fiona), but was looking for help translating this approach to limit the rows of shapefile being read in.

I think the answer is a simple modification of GeoDataFrame.from_features and a function using Fiona in linked questions on limiting columns of shapefile to read into GeoPandas.

2 Answers 2

7

If you don't want to read the whole file, another solution is to use a generator as in Only read specific attribute columns of a shapefile with Geopandas / Fiona

As a reminder (from What does the “yield” keyword do?):

  • Everything you can use "for... in..." on is an iterable: lists, strings, files... but you store all the values in memory
  • Generators are iterables, but you can only read them once. It's because they do not store all the values in memory, they generate the values on the fly

When you use Fiona to read a shapefile, the result is a generator and not a simple list and with the list of rows [0,4,7], we don't need to read all the records of the shapefile but only until the last element in the list.

The generator

def records(filename, list): list = sorted(list) # if the elements of the list are not sorted with fiona.open(filename) as source: for i, feature in enumerate(source[:max(list)+1]): if i in list: yield feature gpd.GeoDataFrame.from_features(records("test.shp", [4,0,7])) 

Result

enter image description here

It is also possible to adapt the solution of rick debbout by converting the list comprehension (will create the entire list in memory first) into generator expression (will create the items on the fly)

def getRows2(fn, idxList): reader = fiona.open(fn) return gpd.GeoDataFrame.from_features((reader[x] for x in idxList)) 

And if you want to extract a continuous slice, 8 to 12 for example, it is easier

c = fiona.open('test.shp') gpd.GeoDataFrame.from_features(c[8:13]) 

enter image description here

4

Using the indexes you could:

import fiona import geopandas as gpd def getRows(fn, idxList): reader = fiona.open(fn) return gpd.GeoDataFrame.from_features([reader[x] for x in idxList]) keepIndexes = [0,4,7] # list of indexes from shp file filename = './path/to/filename' outDF = getRows(filename, keepIndexes) 

I think if you were to use values from a column to select, like you linked to above, you would have to read through the whole file to get them anyway, so, no difference in read-in time.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.