1

Could anybody please answer this? I'm trying to learn reg expression (re) module and I'm not able to get my head around this one. I'm trying to come up regex to catch all 3 file name formats

Python 3.4.3

>>> re.findall("file[\_-]1","file-1 file_1, file\1") ['file-1', 'file_1'] >>> 

Why isn't it catching file\1?? I did try two other patterns, neither one worked :(

1. re.findall("file[\\_-]1","file-1 file_1, file\1") 2. re.findall(r"file[\_-]1","file-1 file_1, file\1") 

Thanks, Sagar

2
  • 1
    Backslashes have a special meaning in Python strings and regular expressions... see e.g. docs.python.org/3/howto/regex.html#the-backslash-plague Commented Jul 21, 2015 at 9:43
  • \1 in "file\1" is a control character \u0001;. If you really plan to capture it, use print (re.findall("file[\u0001_-]1?","file-1 file_1, file\1")), but I doubt you need it. Commented Jul 21, 2015 at 9:52

1 Answer 1

1

Backslashes have meaning in regular expressions, too; \_ just means a literal underscore, not either an underscore or a backslash. Instead, you need r'...' (raw Python string) and \\ (literal backslash in regex). Note that the string you're trying to search in should also be a raw literal or have a doubled backslash:

>>> "file-1 file_1, file\1" 'file-1 file_1, file\x01' # probably not what you expected... >>> r"file-1 file_1, file\1" 'file-1 file_1, file\\1' 

Therefore you can use:

>>> re.findall(r"file[\\_-]1", r"file-1 file_1, file\1") # note ^ ^ ^ ['file-1', 'file_1', 'file\\1'] 
Sign up to request clarification or add additional context in comments.

3 Comments

I was reading the python doc you referred. It was all good until this line "... However, to express this as a Python string literal, both backslashes must be escaped again." Also when you use backslash inside character sets [ ] their special meaning should go away right.. So my regex can be safely be: re.findall("file[\ ]1", "file-1 file_1, file\1") .... Of course this doesn't work. Secondly, if I'm using a raw string for regex then the special of characters is omitted automatically correct? So it would be: re.findall(r"file[]1", "file-1 file_1, file\1") Why raw string + extra \
@SagarKarale inside the square brackets only the regex special meaning is removed. You still need to either double the backslash or make it a raw string to remove the string literal special meaning.
Thank you Jon.. Few more examples made it clear... I had to give "string literals" and "regex expression" a little more thought than before...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.