In many programming languages, the following
find foo([a-z]+)bar and replace with GOO\U\1GAR
will result in the entire match being made uppercase. I can't seem to find the equivalent in python; does it exist?
You can pass a function to re.sub() that will allow you to do this, here is an example:
def upper_repl(match): return 'GOO' + match.group(1).upper() + 'GAR' And an example of using it:
>>> re.sub(r'foo([a-z]+)bar', upper_repl, 'foobazbar') 'GOOBAZGAR' Unfortunately this \U\1 syntax could never work in Python because \U in a string literal indicates the beginning of a 32-bit hex escape sequence. For example, "\U0001f4a9" == "💩".
However there are easy alternative to Perl's case conversion escapes available by using a replacement function. In re.sub(pattern, repl, string, count=0, flags=0) the replacement repl is usually a string, but it can also be a callable. If it is a callable, it's passed the Match object and must return a replacement string to be used.
So, for the example given in the question, this is possible:
>>> string = "fooquuxbar" >>> pattern = "foo([a-z]+)bar" >>> re.sub(pattern, lambda m: f"GOO{m.group(1).upper()}GAR", string) 'GOOQUUXGAR' Here is a table of other string methods which might be useful for similar case conversions.
| Modifier | Description | Example | Python callable to use |
|---|---|---|---|
| \U | Uppercase | foo BAR --> FOO BAR | str.upper |
| \L | Lowercase | foo BAR --> foo bar | str.lower or str.casefold |
| \I | Initial capital | foo BAR --> Foo Bar | str.title |
| \F | First capital | foo BAR --> Foo bar | str.capitalize |
If you already have a replacement string (template), you may not be keen on swapping it out with the verbosity of m.group(1)+...+m.group(2)+...+m.group(3)... Sometimes it's nice to have a tidy little string.
You can use the MatchObject's expand() function to evaluate a template for the match in the same manner as sub(), allowing you to retain as much of your original template as possible. You can use upper on the relevant pieces.
re.sub(r'foo([a-z]+)bar', lambda m: 'GOO' + m.expand(r'\1GAR').upper(), 'foobazbar') While this would not be particularly useful in the example above, and while it does not aid with complex circumstances, it may be more convenient for longer expressions with a greater number of captured groups, such as a MAC address censoring regex, where you just want to ensure the full replacement is capitalized or not.
m.expand(r'\1') works, while m.expand('\1') is treated as ASCII 001 (at least on 3.7.2).For those coming across this on google...
You can also use re.sub to match repeating patterns. For example, you can convert a string with spaces to camelCase:
def to_camelcase(string): string = string[0].lower() + string[1:] # lowercase first return re.sub( r'[\s]+(?P<first>[a-z])', # match spaces followed by \w lambda m: m.group('first').upper(), # get following \w and upper() string) to_camelcase('String to convert') # --> stringToConvert