2

adding re.IGNORECASE to my regex causes some matches to fail. This is what I was trying:

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', re.IGNORECASE) >>>'this ~is~ some tandom. text+ and [some] symbols {+/\\-}' 

we can see that many symbols were not replaced with '~' in the above, but when I try the same without re.IGNORECASE all the special characters are replaced with '~'

print re.sub(r'[^a-zA-Z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}') >>> 'this ~is~ some tandom~ text~ and ~some~ symbols ~~~~~~' 

is there something I am missing about re.IGNORECASE? doesnt it just match both uppercase and lowercase alphabets while leaving the rest (digits, special chars, etc) unchanged? (I am using Anaconda's python 2.7 if that might be of any help)

1 Answer 1

1

You misplaced the flag value, use

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', flags=re.IGNORECASE) # or print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', 0, re.IGNORECASE) 

IDEONE demo

See re.sub docs:

re.sub(pattern, repl, string, count=0, flags=0) The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer.

You use the flag instead of a count. When you passed re.IGNORECASE, the count became non-negative, and only replaced some, not all characters.

Sign up to request clarification or add additional context in comments.

1 Comment

I guess it is a good practice to always specify 'count' and 'flags' explicitly before the argument.. that would have helped prevent such issues

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.