106

I have a pandas series with boolean entries. I would like to get a list of indices where the values are True.

For example the input pd.Series([True, False, True, True, False, False, False, True])

should yield the output [0,2,3,7].

I can do it with a list comprehension, but is there something cleaner or faster?

2
  • 3
    A better testcase is s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH')). Expected output: Index(['A', 'C', 'D', 'H'], ...). Since some solutions (esp. all the np functions) drop the index and use the autonumber index. Commented Apr 21, 2021 at 22:42
  • ...if we have a named index, it's usually very undesirable to drop it. Commented Apr 21, 2021 at 22:57

4 Answers 4

189

Using boolean indexing

>>> s = pd.Series([True, False, True, True, False, False, False, True]) >>> s[s].index Int64Index([0, 2, 3, 7], dtype='int64') 

If need a np.array object, get the .values

>>> s[s].index.values array([0, 2, 3, 7]) 

Using np.nonzero

>>> np.nonzero(s) (array([0, 2, 3, 7]),) 

Using np.flatnonzero

>>> np.flatnonzero(s) array([0, 2, 3, 7]) 

Using np.where

>>> np.where(s)[0] array([0, 2, 3, 7]) 

Using np.argwhere

>>> np.argwhere(s).ravel() array([0, 2, 3, 7]) 

Using pd.Series.index

>>> s.index[s] array([0, 2, 3, 7]) 

Using Python's built-in filter

>>> [*filter(s.get, s.index)] [0, 2, 3, 7] 

Using list comprehension

>>> [i for i in s.index if s[i]] [0, 2, 3, 7] 
Sign up to request clarification or add additional context in comments.

6 Comments

what if the series indices has label instead index-range?
@pyd then you can use options referred to in the answer as Boolean Indexing, pd.Series.index. filter and list comprehension — basically NOT the numpy ones
@Dahn I did not understand your answer. Can you explain further?
@MattS If the series have index other than range index, then any methods listed in rafaelc's answer that are based on numpy won't' work, as numpy will forget the indices upon conversion. I therefore listed the methods that do still work in that case. Does that work for you?
I think we should also mention here .where() method. Check here: pandas.pydata.org/pandas-docs/stable/reference/api/…
|
33

As an addition to rafaelc's answer, here are the according times (from quickest to slowest) for the following setup

import numpy as np import pandas as pd s = pd.Series([x > 0.5 for x in np.random.random(size=1000)]) 

Using np.where

>>> timeit np.where(s)[0] 12.7 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 

Using np.flatnonzero

>>> timeit np.flatnonzero(s) 18 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 

Using pd.Series.index

The time difference to boolean indexing was really surprising to me, since the boolean indexing is usually more used.

>>> timeit s.index[s] 82.2 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 

Using boolean indexing

>>> timeit s[s].index 1.75 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

If you need a np.array object, get the .values

>>> timeit s[s].index.values 1.76 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

If you need a slightly easier to read version <-- not in original answer

>>> timeit s[s==True].index 1.89 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 

Using pd.Series.where <-- not in original answer

>>> timeit s.where(s).dropna().index 2.22 ms ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> timeit s.where(s == True).dropna().index 2.37 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

Using pd.Series.mask <-- not in original answer

>>> timeit s.mask(s).dropna().index 2.29 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> timeit s.mask(s == True).dropna().index 2.44 ms ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

Using list comprehension

>>> timeit [i for i in s.index if s[i]] 13.7 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

Using Python's built-in filter

>>> timeit [*filter(s.get, s.index)] 14.2 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

Using np.nonzero <-- did not work out of the box for me

>>> timeit np.nonzero(s) ValueError: Length of passed values is 1, index implies 1000. 

Using np.argwhere <-- did not work out of the box for me

>>> timeit np.argwhere(s).ravel() ValueError: Length of passed values is 1, index implies 1000. 

Comments

5

Also works: s.where(lambda x: x).dropna().index, and it has the advantage of being easy to chain pipe - if your series is being computed on the fly, you don't need to assign it to a variable.

Note that if s is computed from r: s = cond(r) than you can also use: r.where(lambda x: cond(x)).dropna().index.

1 Comment

"it has the advantage of being easy to chain" -- You can pass a function as an indexer, so this works: s[lambda x: x].index
2

You can use pipe or loc to chain the operation, this is helpful when s is an intermediate result and you don't want to name it.

s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH')) out = s.pipe(lambda s_: s_[s_].index) # or out = s.pipe(lambda s_: s_[s_]).index # or out = s.loc[lambda s_: s_].index 
print(out) Index(['A', 'C', 'D', 'H'], dtype='object') 

2 Comments

Regular indexing works: s[lambda s_: s_].index
With MultiIndex, one can also use an extra step to convert to slice-able array: out = np.array(s.loc[lambda s_: s_].index.to_list())

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.