Filter dataframe rows if value in column is in a set list of values [duplicate]

Question

I have a Python pandas DataFrame rpt:

rpt <class 'pandas.core.frame.DataFrame'> MultiIndex: 47518 entries, ('000002', '20120331') to ('603366', '20091231') Data columns: STK_ID 47518 non-null values STK_Name 47518 non-null values RPT_Date 47518 non-null values sales 47518 non-null values

I can filter the rows whose stock id is '600809' like this: rpt[rpt['STK_ID'] == '600809']

<class 'pandas.core.frame.DataFrame'> MultiIndex: 25 entries, ('600809', '20120331') to ('600809', '20060331') Data columns: STK_ID 25 non-null values STK_Name 25 non-null values RPT_Date 25 non-null values sales 25 non-null values

and I want to get all the rows of some stocks together, such as ['600809','600141','600329']. That means I want a syntax like this:

stk_list = ['600809','600141','600329'] rst = rpt[rpt['STK_ID'] in stk_list] # this does not works in pandas

Since pandas not accept above command, how to achieve the target?

stk_list = ['600809','600141','600329'] result=filter(lambda item: item in stk_list,df['STK_ID']) you can use filter to get a list of iterable items. — ListenSoftware Louise Ai Agent
– ListenSoftware Louise Ai Agent, Commented Sep 22, 2020 at 16:57

Hrvoje · Accepted Answer · 2020-02-25 00:45:48Z

883

Use the isin method:

rpt[rpt['STK_ID'].isin(stk_list)]

edited Feb 25, 2020 at 0:45

Hrvoje

15.4k11 gold badges103 silver badges121 bronze badges

answered Aug 22, 2012 at 3:21

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

stites Over a year ago

what about the negation of this- what would be the correct way of going about a !isin()?

BrenBarn Over a year ago

@dbyte: You just use the ~ operator: rpt[~rpt['STK_ID'].isin(stk_list)]

user1669710 Over a year ago

Related to what @mathtick asked: is there a way to do this on an index in general (needn't necessarily be a multindex)?

BrenBarn Over a year ago

@user1669710: Indexes also have an isin method.

howMuchCheeseIsTooMuchCheese Over a year ago

In case anyone needs the syntax for an index: df[df.index.isin(ls)] where ls is your list

|

Alex Riley · Accepted Answer · 2015-04-12 18:29:27Z

isin() is ideal if you have a list of exact matches, but if you have a list of partial matches or substrings to look for, you can filter using the str.contains method and regular expressions.

For example, if we want to return a DataFrame where all of the stock IDs which begin with '600' and then are followed by any three digits:

>>> rpt[rpt['STK_ID'].str.contains(r'^600[0-9]{3}$')] # ^ means start of string ... STK_ID ... # [0-9]{3} means any three digits ... '600809' ... # $ means end of string ... '600141' ... ... '600329' ... ... ... ...

Suppose now we have a list of strings which we want the values in 'STK_ID' to end with, e.g.

endstrings = ['01$', '02$', '05$']

We can join these strings with the regex 'or' character | and pass the string to str.contains to filter the DataFrame:

>>> rpt[rpt['STK_ID'].str.contains('|'.join(endstrings)] ... STK_ID ... ... '155905' ... ... '633101' ... ... '210302' ... ... ... ...

Finally, contains can ignore case (by setting case=False), allowing you to be more general when specifying the strings you want to match.

For example,

str.contains('pandas', case=False)

would match PANDAS, PanDAs, paNdAs123, and so on.

Thanks for this.. regex search would be very help. Even, though isin only works for perfect matched, it accepts dataframes, Series, Index etc.. @jakevdp provided a great solution here, which works to extract the matching values of df1, given another dataframe df2:stackoverflow.com/a/33282617/4752883. In my case, I have a df2, but the values in df2 wont be exact matches, so I am wondering if there is a way to use regex in isin (or another function), similar to what you pointed out here?

Sanjay T. Sharma · Accepted Answer · 2015-10-27 15:39:51Z

48

You can also directly query your DataFrame for this information.

rpt.query('STK_ID in (600809,600141,600329)')

Or similarly search for ranges:

rpt.query('60000 < STK_ID < 70000')

edited Oct 27, 2015 at 15:39

Sanjay T. Sharma

23.3k4 gold badges63 silver badges72 bronze badges

answered Mar 17, 2015 at 20:12

bscan

3,0761 gold badge18 silver badges17 bronze badges

1 Comment

Gourneau Over a year ago

or to query by a list named my_list rpt.query('STK_ID in @my_list')

yemu · Accepted Answer · 2013-10-10 12:26:29Z

47

you can also use ranges by using:

b = df[(df['a'] > 1) & (df['a'] < 5)]

answered Oct 10, 2013 at 12:26

yemu

28.7k10 gold badges34 silver badges29 bronze badges

Comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

Slicing data with pandas

Given a dataframe like this:

 RPT_Date STK_ID STK_Name sales 0 1980-01-01 0 Arthur 0 1 1980-01-02 1 Beate 4 2 1980-01-03 2 Cecil 2 3 1980-01-04 3 Dana 8 4 1980-01-05 4 Eric 4 5 1980-01-06 5 Fidel 5 6 1980-01-07 6 George 4 7 1980-01-08 7 Hans 7 8 1980-01-09 8 Ingrid 7 9 1980-01-10 9 Jones 4

There are multiple ways of selecting or slicing the data.

Using .isin

The most obvious is the .isin feature. You can create a mask that gives you a series of True/False statements, which can be applied to a dataframe like this:

mask = df['STK_ID'].isin([4, 2, 6]) mask 0 False 1 False 2 True 3 False 4 True 5 False 6 True 7 False 8 False 9 False Name: STK_ID, dtype: bool df[mask] RPT_Date STK_ID STK_Name sales 2 1980-01-03 2 Cecil 2 4 1980-01-05 4 Eric 4 6 1980-01-07 6 George 4

Masking is the ad-hoc solution to the problem, but does not always perform well in terms of speed and memory.

With indexing

By setting the index to the STK_ID column, we can use the pandas builtin slicing object .loc

df.set_index('STK_ID', inplace=True) RPT_Date STK_Name sales STK_ID 0 1980-01-01 Arthur 0 1 1980-01-02 Beate 4 2 1980-01-03 Cecil 2 3 1980-01-04 Dana 8 4 1980-01-05 Eric 4 5 1980-01-06 Fidel 5 6 1980-01-07 George 4 7 1980-01-08 Hans 7 8 1980-01-09 Ingrid 7 9 1980-01-10 Jones 4 df.loc[[4, 2, 6]] RPT_Date STK_Name sales STK_ID 4 1980-01-05 Eric 4 2 1980-01-03 Cecil 2 6 1980-01-07 George 4

This is the fast way of doing it, even if the indexing can take a little while, it saves time if you want to do multiple queries like this.

Merging dataframes

This can also be done by merging dataframes. This would fit more for a scenario where you have a lot more data than in these examples.

stkid_df = pd.DataFrame({"STK_ID": [4,2,6]}) df.merge(stkid_df, on='STK_ID') STK_ID RPT_Date STK_Name sales 0 2 1980-01-03 Cecil 2 1 4 1980-01-05 Eric 4 2 6 1980-01-07 George 4

Note

All the above methods work even if there are multiple rows with the same 'STK_ID'

what about if you need to check two columns of a dataframe? let's say we want to check if the values of the list isin either 'STK_ID' or 'sales'?

akuriako · Accepted Answer · 2017-09-28 03:04:39Z

You can also achieve similar results by using 'query' and @:

eg:

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']}) df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) list_of_values = [3,6] result= df.query("A in @list_of_values") result A B 1 6 2 2 3 3

This syntax is elegant, and this answer deserving of more upvotes

Pedro Lobito · Accepted Answer · 2017-04-26 20:09:12Z

7

You can use query, i.e.:

b = df.query('a > 1 & a < 5')

answered Apr 26, 2017 at 20:09

Pedro Lobito

99.8k36 gold badges274 silver badges278 bronze badges

Collectives™ on Stack Overflow

Filter dataframe rows if value in column is in a set list of values [duplicate]

7 Answers 7

13 Comments

1 Comment

1 Comment

Comments

Slicing data with pandas

Using .isin

With indexing

Merging dataframes

Note

1 Comment

1 Comment

Comments

Linked

Hot Network Questions