1
import regex st = """ <!-- Start of page --> <HTML> <!-- Start of head --> <HEAD> <TITLE>My Title</TITLE> <!-- Page title --> </HEAD> <!-- Body --> <BODY> """ pat = regex.compile(r"<!-{2,}(.*?)-{2,}>") st2 = pat.sub(r'\U\1\E', st) print(st2) 

In the above code, I am trying to implement a Case Conversion operation using "regex" module(indeed used "re" module too) and what I want to do is To convert all text between comments written in HTML to upper Case, for example <!-- Start of page --> TO <!-- START OF PAGE -->, but When I try to do so, With this syntactically correct code, that should have worked It gives me this error.

Traceback (most recent call last): File "C:/Users/m.m/PycharmProjects/untitled9/source.py", line 13, in <module> st2 = pat.sub(r'\U\1\E', st) File "C:\Users\m.m\.virtualenvs\untitled5\lib\site-packages\regex\regex.py", line 676, in _compile_replacement_helper is_group, items = _compile_replacement(source, pattern, is_unicode) File "C:\Users\m.m\.virtualenvs\untitled5\lib\site-packages\regex\_regex_core.py", line 1696, in _compile_replacement return False, [parse_repl_hex_escape(source, HEX_ESCAPES[ch], ch)] File "C:\Users\m.m\.virtualenvs\untitled5\lib\site-packages\regex\_regex_core.py", line 1764, in parse_repl_hex_escape source.string, source.pos) regex._regex_core.error: incomplete escape \U at position 3 

It seem's that it does not know what is the purpose of \U and \L and gives "incomplete escape error"

I am currently using python 3.7 and I have tried to do so, with "re" module too, but it does not work.

I wanted to know , what is the problem ?, I have seen many books used to do case conversion with regex. but The fact that "Why should not this work ?" has been an enigma for me.

Is the problem from the syntax, or does it originate from the python implementation of regex itself, that does not support such operation to convert cases?

In This question, I am trying to convert the text using \E etc, or formally "using CaseConversion in regex"

14
  • 2
    Are there no HTML parsers/libraries that allow you to modify comments? Using RegEx for this seems horrible. Commented Dec 1, 2019 at 7:30
  • 1
    Also, you state it does not know what is the purpose of \U and \L yet your code contains \U and \E. Which is it? Commented Dec 1, 2019 at 7:34
  • 1
    @MohVahedi I’m asking whether your are using \U and \L or \U and \E. I’m not certain that the error is caused by the re module’s inability to handle certain valid expressions. Commented Dec 1, 2019 at 7:39
  • 1
    I’m having trouble finding information on \E, do you know of any good resources for either one? Commented Dec 1, 2019 at 7:40
  • 1
    @MohVahedi I would be interested in seeing that book. In any case, here is a similar question right here on Stack Overflow: stackoverflow.com/q/28588603/11301900. Commented Dec 1, 2019 at 7:42

1 Answer 1

2
print(re.sub(r"<!-{2,}(.*?)-{2,}>",lambda x:"<!--"+x.group(1).upper()+"-->",st)) 

This is using the re module, sub can take the second argument as a string or as a callable, if it is a callable every match is sent as a argument to the callable, you can then do normal operations using that object.

This gives

<!--START OF PAGE--> <HTML> <!--START OF HEAD--> <HEAD> <TITLE>My Title</TITLE> <!--PAGE TITLE--> </HEAD> <!--BODY--> <BODY> 
Sign up to request clarification or add additional context in comments.

4 Comments

would you state, why does not using "\U" and "\E" suffice ?
Why split and map the group result?
I am sorry, "Case conversion using what I Stated is not supported in python" and it gives error according to this resource which indeed is true resource
@TheNamesAlc Don’t forget to actually edit your answer once the realization has hit ;)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.