Find the index of a string value in a pandas DataFrame

Question

How can I identify which column(s) in my DataFrame contain a specific string 'foo'?

Sample DataFrame:

>>> import pandas as pd >>> df = pd.DataFrame({'A':[10,20,42], 'B':['foo','bar','blah'],'C':[3,4,5], 'D':['some','foo','thing']})

I want to find B and D here.

I can search for numbers:

If I'm looking for a number (e.g. 42) instead of a string, I can generate a boolean mask like this:

>>> ~(df.where(df==42)).isnull().all() A True B False C False D False dtype: bool

but not strings:

>>> ~(df.where(df=='foo')).isnull().all() TypeError: Could not compare ['foo'] with block values

I don't want to iterate over each column and row if possible (my actual data is much larger than this example). It feels like there should be a simple and efficient way.

How can I do this?

Divakar · Accepted Answer · 2017-09-27 16:56:24Z

One way with underlying array data -

df.columns[(df.values=='foo').any(0)].tolist()

Sample run -

In [209]: df Out[209]: A B C D 0 10 foo 3 some 1 20 bar 4 foo 2 42 blah 5 thing In [210]: df.columns[(df.values=='foo').any(0)].tolist() Out[210]: ['B', 'D']

If you are looking for just the column-mask -

In [205]: (df.values=='foo').any(0) Out[205]: array([False, True, False, True], dtype=bool)

@sheldonzy Yes, means find if ANY match along the first axis (axis=0), which is per column.
Just to mention np.argwhere(df.values=='foo') find the row,column pairs of detected cells.

BENY · Accepted Answer · 2017-09-27 16:56:30Z

Option 1 df.values

~(df.where(df.values=='foo')).isnull().all() Out[91]: A False B True C False D True dtype: bool

Option 2 isin

~(df.where(df.isin(['foo']))).isnull().all() Out[94]: A False B True C False D True dtype: bool

rko · Accepted Answer · 2017-09-27 17:10:40Z

Unfortunately, it won't index a str through the syntax you gave. It has to be run as a series of type string to compare it with string, unless I am missing something.

try this

~df101.where(df101.isin(['foo'])).isnull().all() A False B True C False D True dtype: bool

Collectives™ on Stack Overflow

Find the index of a string value in a pandas DataFrame

Sample DataFrame:

I can search for numbers:

but not strings:

3 Answers 3

3 Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Sample DataFrame:

I can search for numbers:

but not strings:

3 Answers 3

3 Comments

Comments

1 Comment

Linked

Related