How can I map multiple characters in a string to single characters more efficiently?

Question

I am looking for an efficient method to map groups of characters to single characters.

Currently, my code looks similar to the following:

example = 'Accomodation' VOWELS = 'aeiou' CONSONANTS = 'bcdfghjklmnpqrstvwxyz' output = '' for char in example: if char in VOWELS: output += 'v' elif char in VOWELS.upper(): output += 'V' elif char in CONSONANTS: ....

Eventually it will return, in the case of the example, Vccvcvcvcvvc.

I would like to make this part more efficient:

for char in example: if char in VOWELS: output += 'v' elif char in VOWELS.upper(): output += 'V' elif char in CONSONANTS: ....

Ideally, the solution would allow for a dictionary of characters to map to as the key, with their values being a list of options. E.g.

replace_dict = {'v': VOWELS, 'V': VOWELS.upper(), 'c': CONSONANTS, ...

I am not too familiar with map, but I'd expect the solution would utilise it somehow.

Research

I found a similar problem here: python replace multiple characters in a string

The solution to that problem indicates I would need something like:

target = 'Accomodation' charset = 'aeioubcdfghjklmnpqrstvwxyzAEIOUBCDFGHJKLMNPQRSTVWXYZ' key = 'vvvvvcccccccccccccccccccccVVVVVCCCCCCCCCCCCCCCCCCCCC'

However, I don't think the assignments look particularly clear - despite it saving a block of if/else statements. Additionally, if I wanted to add more character sets, the assignments would even less readable, e.g. for different foreign character sets.

Can anyone, perhaps with better knowledge on built-in functions, produce an example that works more efficiently/cleanly than the above two examples?

I am also open to other ideas that do not require the use of a dictionary.

The solution should be in python3.

There is nothing wrong with your original approach. Seems pretty efficient and moreover readable. — Austin
– Austin, Commented Mar 6, 2019 at 8:58
@Austin I think I just wanted something more scalable and clean - as opposed to writing lots of elif statements! But thank you. — Adam
– Adam, Commented Mar 6, 2019 at 18:48

sanyassh · Accepted Answer · 2019-03-06 09:02:42Z

There is more efficient way with creating such a dict:

example = 'Accomodation' VOWELS = 'aeiou' CONSONANTS = 'bcdfghjklmnpqrstvwxyz' replace_dict = { **{v: 'v' for v in VOWELS}, **{V: 'V' for V in VOWELS.upper()}, **{c: 'c' for c in CONSONANTS} } print(''.join(replace_dict[s] for s in example)) # Vccvcvcvcvvc

Really nice! Clean, concise and scalable for other character sets.

AKX · Accepted Answer · 2019-03-06 09:06:30Z

Your replace_dict idea is close, but it's better to "flip" the dict "inside-out", i.e. turn it from {'v': 'aei', 'c': 'bc'} into {'a': 'v', 'e': 'v', 'b': 'c', ...}.

def get_replace_map_from_dict(replace_dict): replace_map = {} for cls, chars in replace_dict.items(): replace_map.update(dict.fromkeys(chars, cls)) return replace_map def replace_with_map(s, replace_map): return "".join(replace_map.get(c, c) for c in s) VOWELS = "aeiou" CONSONANTS = "bcdfghjklmnpqrstvwxyz" replace_map = get_replace_map_from_dict( {"v": VOWELS, "V": VOWELS.upper(), "c": CONSONANTS} ) print(replace_with_map("Accommodation, thanks!", replace_map))

The replace_with_map function above retains all unmapped characters (but you can change that with the second parameter to .get() there), so the output is

Vccvccvcvcvvc, ccvccc!

Rakesh · Accepted Answer · 2019-03-06 08:56:43Z

This is one approach using a dict.

Ex:

example = 'Accomodation' VOWELS = 'aeiou' CONSONANTS = 'bcdfghjklmnpqrstvwxyz' replace_dict = {'v': VOWELS, "V": VOWELS.upper(), "c": CONSONANTS } print("".join(k for i in example for k, v in replace_dict.items() if i in v ) )

Output:

Vccvcvcvcvvc

The dict should be the other way around though if you want dict-like performance.

Mortz · Accepted Answer · 2019-03-06 09:08:38Z

How about a reverse lookup to what you are doing - should be scalable

VOWELS = 'aeiou' CONSONANTS = 'bcdfghjklmnpqrstvwxyz' example = "Accomodation" lookup_dict = {k: "v" for k in VOWELS} lookup_dict.update({k: "c" for k in CONSONANTS}) lookup_dict.update({k: "V" for k in VOWELS.upper()}) lookup_dict.update({k: "C" for k in CONSONANTS.upper()}) ''.join([lookup_dict[i] for i in example])

Alex Lopatin · Accepted Answer · 2019-03-06 10:04:59Z

Try this one. No need for CONSONANTS and works not only with English, but with Russian letters as well (I was surprised):

example = 'AccomodatioNеёэыуюяЕЁЭЫуюяРаботает' VOWELS = 'aeiouуаоиеёэыуюя' output = '' for char in example: if char.isalpha(): x = 'v' if char.lower() in VOWELS else 'c' output += x if char.islower() else x.upper() print(output)

VccvcvcvcvvCvvvvvvvVVVVvvvCvcvcvvc

The only issue with this is that it looks different to scale with more character sets - but appreciating the input!
Sure it will scale different with more character sets, but with only 6 vowels in English and 9 in Russian it still could outperform the dictionary solution. Put more frequent fist in the VOWELS string and test against the dictionaries :-)

Alex Lopatin · Accepted Answer · 2019-03-07 13:19:08Z

I am new to Python and having so much fun to play with it. Let see how good are these dictionaries. The four algorithms that were suggested here:

Alex (myself) - C runtime library style
Adam - matching with four strings
Sanyash, Rakesh, Mortz - dictionary (look up tables)
AKX - replace with map

I made small corrections in proposed code to make all work consistence. Also, I wanted to keep the combined code under 100 lines, but got to 127 with four functions to test and trying to satisfy PyCharm with number of extra blank lines. Here is the first race results:

Place Name Time Total 1. AKX 0.6777 16.5018 The winner of Gold medal!!! 2. Sanyash 0.8874 21.5725 Slower by 31% 3. Alex 0.9573 23.2569 Slower by 41% 4. Adam 0.9584 23.2210 Slower by 41%

Then I made small improvements to my code:

VOWELS_UP = VOWELS.upper() def vowels_consonants0(example): output = '' for char in example: if char.isalpha(): if char.islower(): output += 'v' if char in VOWELS else 'c' else: output += 'V' if char in VOWELS_UP else 'C' return output

That got me the second place:

Place Name Time Total 1. AKX 0.6825 16.5331 The winner of Gold medal!!! 2. Alex 0.7026 17.1036 Slower by 3% 3. Sanyash 0.8557 20.8817 Slower by 25% 4. Adam 0.9631 23.3327 Slower by 41%

Now I need to shave this 3% and get the first place. I tested with the text from Leo Tolstoy novel War and Peace

Original source code:

import time import itertools VOWELS = 'eaiouу' # in order of letter frequency CONSONANTS = 'bcdfghjklmnpqrstvwxyz' def vowels_consonants0(example): output = '' for char in example: if char.isalpha(): x = 'v' if char.lower() in VOWELS else 'c' output += x if char.islower() else x.upper() return output def vowels_consonants1(example): output = '' for char in example: if char in VOWELS: output += 'v' elif char in VOWELS.upper(): output += 'V' elif char in CONSONANTS: output += 'c' elif char in CONSONANTS.upper(): output += 'C' return output def vowels_consonants2(example): replace_dict = { **{v: 'v' for v in VOWELS}, **{V: 'V' for V in VOWELS.upper()}, **{c: 'c' for c in CONSONANTS}, **{c: 'c' for c in CONSONANTS.upper()} } return ''.join(replace_dict[s] if s in replace_dict else '' for s in example) def get_replace_map_from_dict(replace_dict): replace_map = {} for cls, chars in replace_dict.items(): replace_map.update(dict.fromkeys(chars, cls)) return replace_map def replace_with_map(s, replace_map): return "".join(replace_map.get(c, c) for c in s) replace_map = get_replace_map_from_dict( {"v": VOWELS, "V": VOWELS.upper(), "c": CONSONANTS, "C": CONSONANTS.upper()} ) def vowels_consonants3(example): output = '' for char in example: if char in replace_map: output += char output = replace_with_map(output, replace_map) return output def test(function, name): text = open(name, encoding='utf-8') t0 = time.perf_counter() line_number = 0 char_number = 0 vc_number = 0 # vowels and consonants while True: line_number += 1 line = text.readline() if not line: break char_number += len(line) vc_line = function(line) vc_number += len(vc_line) t0 = time.perf_counter() - t0 text.close() return t0, line_number, char_number, vc_number tests = [vowels_consonants0, vowels_consonants1, vowels_consonants2, vowels_consonants3] names = ["Alex", "Adam", "Sanyash", "AKX"] best_time = float('inf') run_times = [best_time for _ in tests] sum_times = [0.0 for _ in tests] show_result = [True for _ in tests] print("\n!!! Start the race by permutation with no repetitions now ...\n") print(" * - best time in race so far") print(" + - personal best time\n") print("Note Name Time (Permutation)") products = itertools.permutations([0, 1, 2, 3]) for p in list(products): print(p) for n in p: clock, lines, chars, vcs = test(tests[n], 'war_peace.txt') sum_times[n] += clock note = " " if clock < run_times[n]: run_times[n] = clock note = "+" # Improved personal best time if clock < best_time: best_time = clock note = "*" # Improved total best time print("%s %8s %6.4f" % (note, names[n], clock), end="") if show_result[n]: show_result[n] = False print(" Lines:", lines, "Characters:", chars, "Letters:", vcs) else: print() print("\n!!! Finish !!! and the winner by the best run time is ...\n") print("Place Name Time Total") i = 0 for n in sorted(range(len(run_times)), key=run_times.__getitem__): i += 1 t = run_times[n] print("%d. %8s %.4f %.4f " % (i, names[n], t, sum_times[n]), end="") if i == 1: print("The winner of Gold medal!!!") else: print("Slower by %2d%%" % (round(100.0 * (t - best_time)/best_time)))

Collectives™ on Stack Overflow

How can I map multiple characters in a string to single characters more efficiently?

Research

6 Answers 6

1 Comment

Comments

1 Comment

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Research

6 Answers 6

1 Comment

Comments

1 Comment

Comments

2 Comments

Comments

Linked

Related