Regex substitution reversal?

Question

I have a question: starting from this text example:
input_test = "أكتب الدر_س و إحفضه ثم إقرأ القصـــــــــــــــيـــــــــــدة"

I managed to clean this text using these functions:

arabic_punctuations = '''`÷×؛<>_()*&^%][ـ،/:"؟.,'{}~¦+|!”…“–ـ''' english_punctuations = string.punctuation punctuations_list = arabic_punctuations + english_punctuations arabic_diacritics = re.compile(""" ّ | # Tashdid َ | # Fatha ً | # Tanwin Fath ُ | # Damma ٌ | # Tanwin Damm ِ | # Kasra ٍ | # Tanwin Kasr ْ | # Sukun ـ # Tatwil/Kashida """, re.VERBOSE) def normalize_arabic(text): text = re.sub("[إأآا]", "ا", text) return text def remove_diacritics(text): text = re.sub(arabic_diacritics, '', text) return text def remove_punctuations(text): translator = str.maketrans('', '', punctuations_list) return text.translate(translator) def remove_repeating_char(text): return re.sub(r'(.)\1+', r'\1', text)

Which gives me this text as the result:

result = "اكتب الدرس و احفضه ثم اقرا القصيدة"

Now if I have have this case, how can I find the word "اقرا" in the orginal input_test?

The input text can be in English, too. I'm thinking of regex — but I don't know from where to start…

I don't think it's feasible to do what you want because the functions are causing a loss of information. — martineau
– martineau, Commented Jan 29, 2022 at 1:56
is theire any way to check if we can have this word in the input text — mohanad almowahid
– mohanad almowahid, Commented Jan 29, 2022 at 10:14
Generally speaking, no. Because of the substitutions being done there's no way to tell if the word (presumably without any substitutions having been done to it) was in the original text or not — that's what I meant about an information loss. If you're sure the word being sought would not have been affected by any of substitutions, then you could just simply search for it in original string via input_test.find("اقرا"). — martineau
– martineau, Commented Jan 29, 2022 at 12:09
how about genrating a list of words where we replace all the "ا" in "اقرا" with each one in [إأآا] and after we search in in the input_test — mohanad almowahid
– mohanad almowahid, Commented Jan 29, 2022 at 13:25
No, that wouldn't be appropriate for comments. Instead, I think you should ask a new question specifically on that topic (and show your own attempt to do it, of course). Hint: note that the repl argument (second one) that is passed to re.sub() can be a function. — martineau
– martineau, Commented Jan 29, 2022 at 15:01

Collectives™ on Stack Overflow

Regex substitution reversal?

0

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked