1

I'm trying to create a subset of a pandas dataframe, based on values in a list. However, I need to include string indexing. I'll demonstrate with an example:

Here is my dataframe:

df = pd.DataFrame({'A' : ['1-2', '2', '3', '3-8', '4']}) 

Here is what it looks like:

A 0 1-2 1 2 2 3 3 3-8 4 4 

I have a list of values I want to use to select rows from my dataframe.

list1 = ['2', '3'] 

I can use the .isin() function to select rows from my dataframe using my list items.

subset = df[df['A'].isin(list1)] print(subset) A 1 2 2 3 

However, I want any value that includes '2' or '3'. This is my desired output:

 A 1 1-2 2 2 3 3 4 3-8 

Can I use string indexing in my .isin() function? I am struggling to come up with another workaround.

2 Answers 2

3

Check str.split with isin and any

Newdf=df[df.A.str.split('-',expand=True).isin(['2','3']).any(1)].copy() Out[189]: A 0 1-2 1 2 2 3 3 3-8 
Sign up to request clarification or add additional context in comments.

2 Comments

what does .any() do? More specifically, the argument (1) in .any(1).
any True per row @ErichPurpur
1

You can try with regular expression:

import re pattern=re.compile(".*(("+(")|(").join(list1)+"))") print(df.loc[df['A'].apply(lambda x: True if pattern.match(x) else False)]) 

Output:

A 0 1-2 1 2 2 3 3 3-8 [Program finished] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.