Replacing special character in string not working

Question

I have a long string, which includes the text Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,

I am trying to replace '\xe2\x82\xac' with 'EUR' in Python 3.6

If I print the string, I see that it is preceded by b, i.e. it is a byte literal.

 b'<div dir="ltr"><br ...' etc.

I cannot encode it (html = html.encode('UTF-8')), because then I get a bytes-like object is required, not 'str' nor can I decode it ('str' object has no attribute 'decode')

I have tried:

html = html.replace(u"\xe2\x82\xac","EUR") html = html.replace(u'\xe2\x82\xac',"EUR") html = html.replace('\xe2\x82\xac',"EUR") html = html.replace(u"€","EUR")

None of these work.

html.decode("utf-8") gets me an error 'str' object has no attribute 'decode'.

For context, the string is generated by reading the content of an e-mail with the mailbox library:

for message in mbox: for part in message.walk(): html = str(part.get_payload(decode=True))

html.replace('\xe2\x82\xac',"EUR") just works in utf-8 text. — Evhz
– Evhz, Commented Apr 5, 2018 at 13:15
It also works when I copy/paste the string from my question into python. However it does not work on my original string, which I copy/pasted originally into my question. This is a bit puzzling. — Alexis Eggermont
– Alexis Eggermont, Commented Apr 5, 2018 at 13:17
I'm using python 3.6 too and here works. Check if your python file is encoded as UTF-8 Unicode text or try use # -*- coding: utf-8 -*- on the top — Mateus Milanez
– Mateus Milanez, Commented Apr 5, 2018 at 13:19

Giacomo Catenazzi · Accepted Answer · 2018-04-05 13:46:04Z

You should use:

html = html.replace(r"\xe2\x82\xac", "EUR")

So that the string \xe2\x82\xac is replaced to EUR. Assuming that \ is literally on your html.

Otherwise, you should

html = html.replace('\u20ac', 'EUR')

But this seems not the case, because with your unicode symbols, it do not work.

Do not assume that Python use UTF-8 in the strings (in fact it do not use UTF-8 internally).

Note: Python uses UTF-16 (or UTF-32) so \xe2\x82\xac would never been written by Python (from a decoded string). So or \ was literal, or some output process mangled it.

Veera Balla Deva · Accepted Answer · 2018-04-05 13:46:18Z

import unicodedata jil = """"Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,""" data = unicodedata.normalize("NFKD", jil) print(data) >>>" Your Sunday evening order with Uber Eats To: [email protected] [image: map] [image: Uber logo] â¬17.50 Thanks for choosing Uber,

Luca Di Sabatino · Accepted Answer · 2018-04-05 13:29:46Z

it does not work that way.

html="Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber," html = html.replace(u"\xe2\x82\xac","EUR") html = html.replace(u'\xe2\x82\xac',"EUR") html = html.replace('\xe2\x82\xac',"EUR") html = html.replace(u"€","EUR") html = html.encode("utf-8",'strict'); print("Encoded String: " + str(html)) print("Decoded String: " + html.decode("utf-8",'strict'))

Collectives™ on Stack Overflow

Replacing special character in string not working

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related