0

Python Version: Python 3.6. I am trying to replace the Unicode character u"\u0092" (aka curly apostrophe) with a regular apostrophe.

I have tried all of the below:

 mystring = <some string with problem character> # option 1 mystring = mystring.replace(u"\u0092", u\"0027") # option 2 mystring = mystring.replace(u"\u0092", "'") # option 3 mystring = re.sub('\u0092',u"\u0027", mystring) # option 4 mystring = re.sub('\u0092',u"'", mystring) 

None of the above updates the character in mystring. Other sub and replace operations are working - which makes me think it is either an issue with how I am using the Unicode characters, or an issue with this particular character.

Update: I have also tried the suggestion below neither of which work:

 mystring.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8") mystring.decode("utf-8").replace(u"\u2019", u"\u0027").encode("utf-8") 

But it gives me the error: AttributeError: 'str' object has no attribute 'decode'

Just to Clarify: The IDE is not the core issue here. My question is why when I run replace or sub with a Unicode character and print the result does it not register - the character is still present in the string.

7
  • Possible duplicate of How to replace unicode characters in string with something else python? Commented May 30, 2018 at 16:03
  • str.decode("utf-8").replace(u"\u0092", u"\u0027").encode("utf-8") Commented May 30, 2018 at 16:06
  • Thanks for the suggestion - I saw this on the other question mentioned above but does it work for Python3? When I try it I get the error: AttributeError: 'str' object has no attribute 'decode' Commented May 30, 2018 at 16:21
  • 1
    all strings are unicode in python3. you don"t need all that folklore with us everywhere and encoding. just string.replace("’", "'") (in fact, i assumed in my answer you were running python2) Commented May 30, 2018 at 16:30
  • I get this error if I try to use the character directly - with or without the prefix of the u: SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0x92 in position 0: invalid start byte Commented May 30, 2018 at 16:38

1 Answer 1

1

your code is wrong it's \u2019 for apostrophe (’). from wikipedia

U+0092 146 Private Use 2 PU2

that's why eclipse is not happy.


with the right code:

#_*_ coding: utf8 _*_ import re string = u"dkfljglkdfjg’fgkljlf" string = string.replace(u"’", u"'")) string = string.replace(u"\u2019", u"\u0027") string = re.sub(u'\u2019',u"\u0027", string) string = re.sub(u'’',u"'", string) 

all solutions work

and don't call your vars str

Sign up to request clarification or add additional context in comments.

3 Comments

The first part doesn't work for me because Eclipse doesn't recognise the character directly. And same issue with the second part - when I print the result it is still the same curly comma and fails comparison test.
i never used eclipse but i'd be most surprised if it didn't recognize regular unicode chars
Sorry - as I mentioned in my question I tried a couple of those and the additional ones also don't work... using the prefix of the u or not doesn't seem to make a difference

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.