1

I have two dataframes,

df1,

 Names one two three Sri is a good player Ravi is a mentor Kumar is a cricketer 

df2,

 values sri NaN sri, is kumar,cricketer 

I am trying to get the row in df1 which contains the all the items in df2

My expected output is,

 values Names sri Sri is a good player NaN sri, is Sri is a good player kumar,cricketer Kumar is a cricketer 

i tried, df1["Names"].str.contains("|".join(df2["values"].values.tolist()))

but I cannot achieve my expected output as it has (","). Please help

1
  • It should be a match, order is not a matter Commented Nov 16, 2017 at 4:34

2 Answers 2

3

Using sets

s1 = df1.Names.dropna() s1.loc[:] = [set(x.lower().split()) for x in s1.values.tolist()] a1 = s1.values s2 = df2['values'].dropna() s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()] a2 = s2.values i = np.column_stack([a1 >= a2[:, None], [True] * len(a2)]).argmax(1) df2.assign(Names=pd.Series( np.append(df1.Names.values, np.nan)[i], s2.index )) values Names 0 sri Sri is a good player 1 NaN NaN 2 sri, is Sri is a good player 3 kumar,cricketer Kumar is a cricketer 
Sign up to request clarification or add additional context in comments.

7 Comments

I don't want the output df separately. I want to add it to my df2
Then assign the result back to df2. Or just assign to a new column directly instead of using assing. Like df2.loc[:, 'Names'] = pd.Series(np.append(df1.Names.values, np.nan)[i], s2.index)
It worked, can you suggest me the best resource to learn pandas easily.
Nothing worthwhile is easy!
1. Start here to get an idea of what pandas can do. 2. Give yourself a data analysis task. and figure it out using pandas. Ask questions if needed. 3. Answer other people's questions. Even if you don't post answers, read the questions and figure them out. Read other people's answers to the question you just tried to answer. 4. Practice!
|
1
import pandas as pd names = [ 'one two three', 'Sri is a good player', 'Ravi is a mentor', 'Kumar is a cricketer' ] values = [ 'sri', 'NaN', 'sri, is', 'kumar,cricketer', ] names = pd.Series(names) values = pd.DataFrame(values, columns=['values']) def foo(words): names_copy = names.copy() for word in words.split(','): names_copy = names_copy[names_copy.str.contains(word, case=False)] return names_copy.values values['names'] = values['values'].map(foo) values values names 0 sri [Sri is a good player] 1 NaN [] 2 sri, is [Sri is a good player] 3 kumar,cricketer [Kumar is a cricketer] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.