Getting a list of indices where pandas boolean series is True

Question

I have a pandas series with boolean entries. I would like to get a list of indices where the values are True.

For example the input pd.Series([True, False, True, True, False, False, False, True])

should yield the output [0,2,3,7].

I can do it with a list comprehension, but is there something cleaner or faster?

A better testcase is s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH')). Expected output: Index(['A', 'C', 'D', 'H'], ...). Since some solutions (esp. all the np functions) drop the index and use the autonumber index. — smci
– smci, Commented Apr 21, 2021 at 22:42
...if we have a named index, it's usually very undesirable to drop it. — smci
– smci, Commented Apr 21, 2021 at 22:57

wjandrea · Accepted Answer · 2023-06-18 22:37:46Z

Using boolean indexing

>>> s = pd.Series([True, False, True, True, False, False, False, True]) >>> s[s].index Int64Index([0, 2, 3, 7], dtype='int64')

If need a np.array object, get the .values

>>> s[s].index.values array([0, 2, 3, 7])

Using `np.nonzero`

>>> np.nonzero(s) (array([0, 2, 3, 7]),)

Using `np.flatnonzero`

>>> np.flatnonzero(s) array([0, 2, 3, 7])

Using `np.where`

>>> np.where(s)[0] array([0, 2, 3, 7])

Using `np.argwhere`

>>> np.argwhere(s).ravel() array([0, 2, 3, 7])

Using `pd.Series.index`

>>> s.index[s] array([0, 2, 3, 7])

Using Python's built-in `filter`

>>> [*filter(s.get, s.index)] [0, 2, 3, 7]

Using list comprehension

>>> [i for i in s.index if s[i]] [0, 2, 3, 7]

@pyd then you can use options referred to in the answer as Boolean Indexing, pd.Series.index. filter and list comprehension — basically NOT the numpy ones
@Dahn I did not understand your answer. Can you explain further?
@MattS If the series have index other than range index, then any methods listed in rafaelc's answer that are based on numpy won't' work, as numpy will forget the indices upon conversion. I therefore listed the methods that do still work in that case. Does that work for you?
I think we should also mention here .where() method. Check here: pandas.pydata.org/pandas-docs/stable/reference/api/…

wjandrea · Accepted Answer · 2023-06-18 22:45:17Z

As an addition to rafaelc's answer, here are the according times (from quickest to slowest) for the following setup

import numpy as np import pandas as pd s = pd.Series([x > 0.5 for x in np.random.random(size=1000)])

Using `np.where`

>>> timeit np.where(s)[0] 12.7 µs ± 77.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using `np.flatnonzero`

>>> timeit np.flatnonzero(s) 18 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Using `pd.Series.index`

The time difference to boolean indexing was really surprising to me, since the boolean indexing is usually more used.

>>> timeit s.index[s] 82.2 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Using boolean indexing

>>> timeit s[s].index 1.75 ms ± 2.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a np.array object, get the .values

>>> timeit s[s].index.values 1.76 ms ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If you need a slightly easier to read version <-- not in original answer

>>> timeit s[s==True].index 1.89 ms ± 3.52 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Using `pd.Series.where` <-- not in original answer

>>> timeit s.where(s).dropna().index 2.22 ms ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> timeit s.where(s == True).dropna().index 2.37 ms ± 2.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `pd.Series.mask` <-- not in original answer

>>> timeit s.mask(s).dropna().index 2.29 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) >>> timeit s.mask(s == True).dropna().index 2.44 ms ± 5.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using list comprehension

>>> timeit [i for i in s.index if s[i]] 13.7 ms ± 40.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using Python's built-in `filter`

>>> timeit [*filter(s.get, s.index)] 14.2 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using `np.nonzero` <-- did not work out of the box for me

>>> timeit np.nonzero(s) ValueError: Length of passed values is 1, index implies 1000.

Using `np.argwhere` <-- did not work out of the box for me

>>> timeit np.argwhere(s).ravel() ValueError: Length of passed values is 1, index implies 1000.

tsvikas · Accepted Answer · 2021-11-20 22:20:50Z

Also works: s.where(lambda x: x).dropna().index, and it has the advantage of being easy to chain pipe - if your series is being computed on the fly, you don't need to assign it to a variable.

Note that if s is computed from r: s = cond(r) than you can also use: r.where(lambda x: cond(x)).dropna().index.

"it has the advantage of being easy to chain" -- You can pass a function as an indexer, so this works: s[lambda x: x].index

Ynjxsjmh · Accepted Answer · 2022-11-09 11:07:18Z

You can use pipe or loc to chain the operation, this is helpful when s is an intermediate result and you don't want to name it.

s = pd.Series([True, False, True, True, False, False, False, True], index=list('ABCDEFGH')) out = s.pipe(lambda s_: s_[s_].index) # or out = s.pipe(lambda s_: s_[s_]).index # or out = s.loc[lambda s_: s_].index

print(out) Index(['A', 'C', 'D', 'H'], dtype='object')

With MultiIndex, one can also use an extra step to convert to slice-able array: out = np.array(s.loc[lambda s_: s_].index.to_list())

Collectives™ on Stack Overflow

Getting a list of indices where pandas boolean series is True

4 Answers 4

Using boolean indexing

Using `np.nonzero`

Using `np.flatnonzero`

Using `np.where`

Using `np.argwhere`

Using `pd.Series.index`

Using Python's built-in `filter`

Using list comprehension

6 Comments

Using `np.where`

Using `np.flatnonzero`

Using `pd.Series.index`

Using boolean indexing

Using `pd.Series.where` <-- not in original answer

Using `pd.Series.mask` <-- not in original answer

Using list comprehension

Using Python's built-in `filter`

Using `np.nonzero` <-- did not work out of the box for me

Using `np.argwhere` <-- did not work out of the box for me

Comments

1 Comment

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Using boolean indexing

Using np.nonzero

Using np.flatnonzero

Using np.where

Using np.argwhere

Using pd.Series.index

Using Python's built-in filter

Using list comprehension

6 Comments

Using np.where

Using np.flatnonzero

Using pd.Series.index

Using boolean indexing

Using pd.Series.where <-- not in original answer

Using pd.Series.mask <-- not in original answer

Using list comprehension

Using Python's built-in filter

Using np.nonzero <-- did not work out of the box for me

Using np.argwhere <-- did not work out of the box for me

Comments

1 Comment

2 Comments

Linked

Related

Using `np.nonzero`

Using `np.flatnonzero`

Using `np.where`

Using `np.argwhere`

Using `pd.Series.index`

Using Python's built-in `filter`

Using `np.where`

Using `np.flatnonzero`

Using `pd.Series.index`

Using `pd.Series.where` <-- not in original answer

Using `pd.Series.mask` <-- not in original answer

Using Python's built-in `filter`

Using `np.nonzero` <-- did not work out of the box for me

Using `np.argwhere` <-- did not work out of the box for me