1

As stated in title, I have a dataframe (let's call this df1) that is similar to this:

Col A Desc
00001 Dog
00002 dogs
00003 cat
00004 cats
00005 hooman

I have a list of keywords I want to search, in a second dataframe, df2:

Keyword
dog
cats
bird

How do I identify all records in df1 that has at least one keyword match from df2, and the final outcome is a dataframe (new or add to df1) that lists all the columns in df1 + the matched keyword? On top of that... ideally case insensitive, and the keyword list entry "dog" would help me also find "dogs" from df1?

Sample Expected Output:

Col A Desc Matched Keyword
00001 Dog dog
00002 dogs dog
00003 cat
00004 cats cats
00005 hooman

I've searched for some time in this site, here are a few other ones I have tried to follow but none of them actually worked. I always get nothing matched.

search dataframe for a keyword in any column and get the rows value matching between two DataFrames using pandas in python searching if anyone of word is present in the another column of a dataframe or in another data frame using python How to search for a keyword in different pandas dataframe and update or create a new column with matching keyword in parent DF

Any help would be great, thanks!

1
  • please provide the expected output for clarity Commented Jul 8, 2022 at 17:19

1 Answer 1

0
 import pandas as pd from typing import List df1 = pd.DataFrame({'col1': ["0001","0002","0003","0004","0005"], 'values':["dogs","cat","Dog","cats","hooman"]}) df2 = pd.DataFrame({"Keywords": ['dog','cat','bird']}) def find_string_in_substring(value:str, list_of_strings: List[str]): for sub_value in list_of_strings: if value.lower() in sub_value.lower() or sub_value.lower() in value.lower(): return sub_value return False df1["keyword_from_df2"] = df1["values"].apply(lambda x : find_string_in_substring(x,df2['Keywords'].tolist())) df1 

The logic is pretty straight forward, hope it is good enough, if not I will try to help better!

Sign up to request clarification or add additional context in comments.

5 Comments

thanks @eldar-shua! I just ran it but have a question - if nothing is found, is the column filled with a string "False" or is it boolean? I'm trying to filter and validate the results, when I try to filter False as string df1[df1['Matched Keywords'] != "False", I see records are filtered by instead of "not equal", it gave me all the "False"? If I treat that as boolean, nothing returned. Or did I do something as I am not too familiar with Python (still).
you are welcome! df1[df1["keyword_from_df2"] != False] and not "False"
It would be good if you would mark my answer as correct, thank you
hmm... now that actually seem that no match was found... but I made sure it does have a match. maybe something else I am missing
also @eldar-shua I am not familiar with the lambda thing - what is x referring to?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.