46

I have an indexed pandas dataframe. By searching through its index, I find a row of interest. How do I find out the iloc of this row?

Example:

dates = pd.date_range('1/1/2000', periods=8) df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) df A B C D 2000-01-01 -0.077564 0.310565 1.112333 1.023472 2000-01-02 -0.377221 -0.303613 -1.593735 1.354357 2000-01-03 1.023574 -0.139773 0.736999 1.417595 2000-01-04 -0.191934 0.319612 0.606402 0.392500 2000-01-05 -0.281087 -0.273864 0.154266 0.374022 2000-01-06 -1.953963 1.429507 1.730493 0.109981 2000-01-07 0.894756 -0.315175 -0.028260 -1.232693 2000-01-08 -0.032872 -0.237807 0.705088 0.978011 window_stop_row = df[df.index < '2000-01-04'].iloc[-1] window_stop_row Timestamp('2000-01-08 00:00:00', offset='D') #which is the iloc of window_stop_row? 
6
  • What do you mean exactly by "searching through its index"? Commented Jan 20, 2016 at 10:09
  • What did you find? Number of position or label of index? Commented Jan 20, 2016 at 10:10
  • @filmor: smth like window_start_row = values[values.Timestamp < row.Timestamp - window_length][-1]. I need the iloc of window_start_row Commented Jan 20, 2016 at 10:10
  • @anton: I found the whole row. I need some values from its cells, but i also need its iloc. Commented Jan 20, 2016 at 10:11
  • 2
    df.index.get_loc(window_stop_row.name) Commented Jan 20, 2016 at 10:33

4 Answers 4

49

Generally speaking, pass the named index value to index.get_loc:

df.index.get_loc(row_of_interest_named_index) 

Since you’re dealing with dates it may be more convenient to retrieve the index value with .name:

In [131]: dates = pd.date_range('1/1/2000', periods=8) df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) df Out[131]: A B C D 2000-01-01 0.095234 -1.000863 0.899732 -1.742152 2000-01-02 -0.517544 -1.274137 1.734024 -1.369487 2000-01-03 0.134112 1.964386 -0.120282 0.573676 2000-01-04 -0.737499 -0.581444 0.528500 -0.737697 2000-01-05 -1.777800 0.795093 0.120681 0.524045 2000-01-06 -0.048432 -0.751365 -0.760417 -0.181658 2000-01-07 -0.570800 0.248608 -1.428998 -0.662014 2000-01-08 -0.147326 0.717392 3.138620 1.208639 In [133]: window_stop_row = df[df.index < '2000-01-04'].iloc[-1] window_stop_row.name Out[133]: Timestamp('2000-01-03 00:00:00', offset='D') In [134]: df.index.get_loc(window_stop_row.name) Out[134]: 2 

get_loc returns the ordinal position of the label in your index which is what you want:

In [135]: df.iloc[df.index.get_loc(window_stop_row.name)] Out[135]: A 0.134112 B 1.964386 C -0.120282 D 0.573676 Name: 2000-01-03 00:00:00, dtype: float64 

if you just want to search the index then so long as it is sorted then you can use searchsorted:

In [142]: df.index.searchsorted('2000-01-04') - 1 Out[142]: 2 
Sign up to request clarification or add additional context in comments.

6 Comments

@lmsasu which bit? the .name attribute is not so obvious but the rest is reasonable to me
@lmsasu if you want to search the index, then so long as the index is sorted then this would give you what you want: In [138]: df.index.searchsorted('2000-01-04') - 1 Out[138]: 2
I would have loved to use this, but I get AttributeError: 'DataFrame' object has no attribute 'name'. stackoverflow.com/questions/56214275/…
IMO, the method ought to be named get_iloc, as it's the integer location that it's returning. The name get_loc seems incorrect.
@sdbbs Use .name on a row, not on the dataframe.
|
9

While pandas.Index.get_loc() will only work if you have a single key, the following paradigm will also work getting the iloc of multiple elements:

np.argwhere(condition).flatten() # array of all iloc where condition is True 

In your case, picking the latest element where df.index < '2000-01-04':

np.argwhere(df.index < '2000-01-04').flatten()[-1] # returns 2 

Comments

1

IIUC you could call index for your case:

In [53]: df[df.index < '2000-01-04'].index[-1] Out[53]: Timestamp('2000-01-03 00:00:00', offset='D') 

EDIT

I think @EdChums answer is what you want. Alternatively you could filter your dataframe with values which you get, then use all to find the row with that values and then pass it to the index:

In [67]: df == window_stop_row Out[67]: A B C D 2000-01-01 False False False False 2000-01-02 False False False False 2000-01-03 True True True True 2000-01-04 False False False False 2000-01-05 False False False False 2000-01-06 False False False False 2000-01-07 False False False False 2000-01-08 False False False False In [68]: (df == window_stop_row).all(axis=1) Out[68]: 2000-01-01 False 2000-01-02 False 2000-01-03 True 2000-01-04 False 2000-01-05 False 2000-01-06 False 2000-01-07 False 2000-01-08 False Freq: D, dtype: bool In [69]: df.index[(df == window_stop_row).all(axis=1)] Out[69]: DatetimeIndex(['2000-01-03'], dtype='datetime64[ns]', freq='D') 

3 Comments

Thanks. However, I would like to get the answer "2", the iloc of this line. This is the question for me - how do i get this number?
After Anton's edit: it would be nice to make use of the index and search based on it. It is expected that index-based searching works faster.
Use @EdChums results for that.
-1

You could try looping through each row in the dataframe:

 for row_number,row in dataframe.iterrows(): if row['column_header'] == YourValue: print row_number 

This will give you the row with which to use the iloc function

1 Comment

This will be incredibly slow on large frames

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.