re.IGNORECASE unexpected behaviour in python 2.7

Question

adding re.IGNORECASE to my regex causes some matches to fail. This is what I was trying:

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', re.IGNORECASE) >>>'this ~is~ some tandom. text+ and [some] symbols {+/\\-}'

we can see that many symbols were not replaced with '~' in the above, but when I try the same without re.IGNORECASE all the special characters are replaced with '~'

print re.sub(r'[^a-zA-Z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}') >>> 'this ~is~ some tandom~ text~ and ~some~ symbols ~~~~~~'

is there something I am missing about re.IGNORECASE? doesnt it just match both uppercase and lowercase alphabets while leaving the rest (digits, special chars, etc) unchanged? (I am using Anaconda's python 2.7 if that might be of any help)

Wiktor Stribiżew · Accepted Answer · 2016-02-08 21:39:04Z

You misplaced the flag value, use

print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', flags=re.IGNORECASE) # or print re.sub(r'[^a-z0-9 ]', '~', 'this (is) some tandom. text+ and [some] symbols {+/\-}', 0, re.IGNORECASE)

IDEONE demo

See re.sub docs:

re.sub(pattern, repl, string, count=0, flags=0) The optional argument count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer.

You use the flag instead of a count. When you passed re.IGNORECASE, the count became non-negative, and only replaced some, not all characters.

I guess it is a good practice to always specify 'count' and 'flags' explicitly before the argument.. that would have helped prevent such issues

Collectives™ on Stack Overflow

re.IGNORECASE unexpected behaviour in python 2.7

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related