I was looking at a Stack Overflow question and got somewhat carried away with improving the solution, well beyond the scope of that question.
In summary, we have a string such as
ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.
We are required to replace the NOUNs using a list of noun phrases; similarly for the other uppercase words.
It seems natural to me to provide the lists using a dict, like this:
replacements = { 'NOUN': ["pool", "giraffe"], 'VERB': ["smiled", "waved"], 'ADJECTIVE': ["happy"], }
My function to make the substitutions is then
import copy import re def replace_in_string(string, replacements): ''' Replace each key of 'replacements' by successive elements from its value list. >>> replace_in_string('foobar', {'bar': ['baz']}) 'foobaz' >>> replace_in_string('foobar/fiebar', {'fie': ['foo'], 'bar': ['baz', 'bor']}) 'foobaz/foobor' Each replacement list must contain enough elements to substutite all the matches: >>> replace_in_string('foobar/fiebar', {'bar': ['baz']}) Traceback (most recent call last): ... IndexError: pop from empty list It's fine to provide more replacements than needed: >>> replace_in_string('foobar/fiebar', {'fie': ['foo', 'fo']}) 'foobar/foobar' ''' regex = re.compile('|'.join(map(re.escape, replacements.keys()))) terms = copy.deepcopy(replacements) def replace_func(m): return terms[m.group()].pop(0) return regex.sub(replace_func, string) if __name__ == '__main__': import doctest doctest.testmod() Here's a small demo, using examples taken from the SO question:
replacements = { 'NOUN': ["pool", "giraffe"], 'VERB': ["smiled", "waved"], 'ADJECTIVE': ["happy"], } TextFileContent ='ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.' print(replace_in_string(TextFileContent, replacements)) A possible alternative is to leave excess matches unreplaced:
def replace_func(m): items = terms[m.group()] return items.pop(0) if items else m.group() Would that be more useful? Is there scope to provide both behaviours?
terms = {k: iter(vs) for k, vs in replacements.items()}andnext(terms[m.group()]). \$\endgroup\$