I'm looking for an effective way to solve this problem
Let say we want to find a list of words in a string ignoring the case, but instead of storing the matched string we want a string with the same case as the original list.
For example :
words_to_match = ['heLLo', 'jumP', 'TEST', 'RESEARCH stuff'] text = 'hello this is jUmp test jump and research stuff' # Result should be {'TEST', 'heLLo', 'jumP', 'RESEARCH stuff'} Here is my current approach:
words_to_match = ['heLLo', 'jumP', 'TEST', 'RESEARCH stuff'] I convert this to following regex :
regex = re.compile(r'\bheLLo\b|\bjumP\b|\bTEST\b|\bRESEARCH stuff\b', re.IGNORECASE) Then
word_founds = re.findall(regex,'hello this is jUmp test jump and research stuff') normalization_dict = {w.lower():w for w in words_to_match} # normalization dict : {'hello': 'heLLo', 'jump': 'jumP', 'test': 'TEST', 'research stuff': 'RESEARCH stuff'} final_list = [normalization_dict[w.lower()] for w in word_founds] # final_list : ['heLLo', 'jumP', 'TEST', 'jumP', 'RESEARCH stuff'] final_result = set(final_list) # final_result : {'TEST', 'heLLo', 'jumP', 'RESEARCH stuff'} This is my expected result, I just want to know if there is a faster/more elegant way to solve this problem.