1

I'm looking for method, that iterates over the rows, but apply some method only for every 20th or 30th row values so something like:

UPDATED CODE

for index, row in df.iterrows(), index=+20: location= geolocator.reverse("%s, %s" % (row['lat'],row['long']),timeout=None) row['location']=location.address time.sleep(3) return df 

Actually I try to minimize the number of requests, cause otherwise I have the timeout issue. That's why I tried iterate over the rows, and apply the function of request only for every 20th or 60th row (cause I have 7000 rows) and not to speed the process by applying the time.sleep method

3 Answers 3

4

Try this:

for index, row in enumerate(df): if index % 20 == 0: # do something 
Sign up to request clarification or add additional context in comments.

Comments

2

Just use enumerate and the modulus operator:

for index, row in enumerate(df.iterrows()): if not index%20: row['C']=some_function() return df 

I took the return out of the loop so that the loop wouldn't end after one iteration.

2 Comments

seems good, but it throws TypeError: tuple indices must be integers, not str
That means row is a tuple rather than the dictionary-like structure (indexed with string keys) you're expecting. I don't know pandas; sorry.
0

Why not just slice the df using iloc and a step param:

Example:

In [120]: df = pd.DataFrame({'c':np.random.randn(30)}) df Out[120]: c 0 -0.737805 1 1.158012 2 -0.348384 3 0.044989 4 0.962584 5 2.041479 6 1.376785 7 0.208565 8 -1.535244 9 0.389831 10 0.049862 11 -0.142717 12 -0.794087 13 1.316492 14 0.182952 15 0.850953 16 0.015589 17 0.062692 18 -1.551303 19 0.937899 20 0.583003 21 -0.612411 22 0.762307 23 -0.682298 24 -0.897314 25 -0.101144 26 -0.617573 27 -2.168498 28 0.631021 29 -1.592888 In [121]: df['c'].iloc[::5] = 0 df Out[121]: c 0 0.000000 1 1.158012 2 -0.348384 3 0.044989 4 0.962584 5 0.000000 6 1.376785 7 0.208565 8 -1.535244 9 0.389831 10 0.000000 11 -0.142717 12 -0.794087 13 1.316492 14 0.182952 15 0.000000 16 0.015589 17 0.062692 18 -1.551303 19 0.937899 20 0.000000 21 -0.612411 22 0.762307 23 -0.682298 24 -0.897314 25 0.000000 26 -0.617573 27 -2.168498 28 0.631021 29 -1.592888 

This will be much faster than iterating over every row

So in your case:

df['C'].iloc[::20] = some_function() 

should work

7 Comments

seems good! but I don't want to make process faster, I updated the question, maybe you know how ti face this issue
Well I don't know why you get the timeout issue, I would look at that as an issue as my answer is vectorised and will be much faster than iterating row wise
well, I suppose the problem is that server is limited by the number of request per some period of time, that's why I put time.sleep at the end of every iteration
following the logic of your example I applied this method as location= geolocator.reverse("%s, %s" % (df['lat'].iloc[::20], df['long'].iloc[::20]),timeout=None) df['location'].iloc[::20] =location.address that throws me ``` ValueError: Must be a coordinate pair or Point```
That is an issue with your func not understanding the Series, you'd have to do df.iloc[::20].apply(lambda x: geolocator.reverse(str(x['lat']), str(x['long']), timeout=None), axis=1)
|