python replace and sub not working with unicode character u"\u0092"

Question

Python Version: Python 3.6. I am trying to replace the Unicode character u"\u0092" (aka curly apostrophe) with a regular apostrophe.

I have tried all of the below:

 mystring = <some string with problem character> # option 1 mystring = mystring.replace(u"\u0092", u\"0027") # option 2 mystring = mystring.replace(u"\u0092", "'") # option 3 mystring = re.sub('\u0092',u"\u0027", mystring) # option 4 mystring = re.sub('\u0092',u"'", mystring)

None of the above updates the character in mystring. Other sub and replace operations are working - which makes me think it is either an issue with how I am using the Unicode characters, or an issue with this particular character.

Update: I have also tried the suggestion below neither of which work:

 mystring.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8") mystring.decode("utf-8").replace(u"\u2019", u"\u0027").encode("utf-8")

But it gives me the error: AttributeError: 'str' object has no attribute 'decode'

Just to Clarify: The IDE is not the core issue here. My question is why when I run replace or sub with a Unicode character and print the result does it not register - the character is still present in the string.

Possible duplicate of How to replace unicode characters in string with something else python? — wp78de
– wp78de, Commented May 30, 2018 at 16:03
str.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8") — wp78de
– wp78de, Commented May 30, 2018 at 16:06
Thanks for the suggestion - I saw this on the other question mentioned above but does it work for Python3? When I try it I get the error: AttributeError: 'str' object has no attribute 'decode' — Pamela Kelly
– Pamela Kelly, Commented May 30, 2018 at 16:21
all strings are unicode in python3. you don"t need all that folklore with us everywhere and encoding. just string.replace("’", "'") (in fact, i assumed in my answer you were running python2) — bobrobbob
– bobrobbob, Commented May 30, 2018 at 16:30
I get this error if I try to use the character directly - with or without the prefix of the u: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x92 in position 0: invalid start byte — Pamela Kelly
– Pamela Kelly, Commented May 30, 2018 at 16:38

bobrobbob · Accepted Answer · 2018-05-30 16:53:02Z

1

your code is wrong it's \u2019 for apostrophe (’). from wikipedia

U+0092 146 Private Use 2 PU2

that's why eclipse is not happy.

with the right code:

#_*_ coding: utf8 _*_ import re string = u"dkfljglkdfjg’fgkljlf" string = string.replace(u"’", u"'")) string = string.replace(u"\u2019", u"\u0027") string = re.sub(u'\u2019',u"\u0027", string) string = re.sub(u'’',u"'", string)

all solutions work

and don't call your vars str

edited May 30, 2018 at 16:53

answered May 30, 2018 at 16:17

bobrobbob

1,28111 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Pamela Kelly Over a year ago

The first part doesn't work for me because Eclipse doesn't recognise the character directly. And same issue with the second part - when I print the result it is still the same curly comma and fails comparison test.

bobrobbob Over a year ago

i never used eclipse but i'd be most surprised if it didn't recognize regular unicode chars

Pamela Kelly Over a year ago

Sorry - as I mentioned in my question I tried a couple of those and the additional ones also don't work... using the prefix of the u or not doesn't seem to make a difference

Collectives™ on Stack Overflow

python replace and sub not working with unicode character u"\u0092"

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related