0

I have a long string, which includes the text Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,

I am trying to replace '\xe2\x82\xac' with 'EUR' in Python 3.6

If I print the string, I see that it is preceded by b, i.e. it is a byte literal.

 b'<div dir="ltr"><br ...' etc. 

I cannot encode it (html = html.encode('UTF-8')), because then I get a bytes-like object is required, not 'str' nor can I decode it ('str' object has no attribute 'decode')

I have tried:

html = html.replace(u"\xe2\x82\xac","EUR") html = html.replace(u'\xe2\x82\xac',"EUR") html = html.replace('\xe2\x82\xac',"EUR") html = html.replace(u"€","EUR") 

None of these work.

html.decode("utf-8") gets me an error 'str' object has no attribute 'decode'.

For context, the string is generated by reading the content of an e-mail with the mailbox library:

for message in mbox: for part in message.walk(): html = str(part.get_payload(decode=True)) 
6
  • 5
    Your first line of replace worked for me, in python 3.6 Commented Apr 5, 2018 at 13:13
  • html.replace('\xe2\x82\xac',"EUR") just works in utf-8 text. Commented Apr 5, 2018 at 13:15
  • It also works when I copy/paste the string from my question into python. However it does not work on my original string, which I copy/pasted originally into my question. This is a bit puzzling. Commented Apr 5, 2018 at 13:17
  • I'm using python 3.6 too and here works. Check if your python file is encoded as UTF-8 Unicode text or try use # -*- coding: utf-8 -*- on the top Commented Apr 5, 2018 at 13:19
  • 1
    Note that the u"string" prefix is unnecessary in python3 Commented Apr 5, 2018 at 13:32

3 Answers 3

2

You should use:

html = html.replace(r"\xe2\x82\xac", "EUR") 

So that the string \xe2\x82\xac is replaced to EUR. Assuming that \ is literally on your html.

Otherwise, you should

html = html.replace('\u20ac', 'EUR') 

But this seems not the case, because with your unicode symbols, it do not work.

Do not assume that Python use UTF-8 in the strings (in fact it do not use UTF-8 internally).

Note: Python uses UTF-16 (or UTF-32) so \xe2\x82\xac would never been written by Python (from a decoded string). So or \ was literal, or some output process mangled it.

Sign up to request clarification or add additional context in comments.

Comments

1
import unicodedata jil = """"Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber,""" data = unicodedata.normalize("NFKD", jil) print(data) >>>" Your Sunday evening order with Uber Eats To: [email protected] [image: map] [image: Uber logo] â¬17.50 Thanks for choosing Uber, 

Comments

0

it does not work that way.

html="Your Sunday evening order with Uber Eats\nTo: [email protected]\n\n\n[image: map]\n\n[image: Uber logo]\n\xe2\x82\xac17.50\nThanks for choosing Uber," html = html.replace(u"\xe2\x82\xac","EUR") html = html.replace(u'\xe2\x82\xac',"EUR") html = html.replace('\xe2\x82\xac',"EUR") html = html.replace(u"€","EUR") html = html.encode("utf-8",'strict'); print("Encoded String: " + str(html)) print("Decoded String: " + html.decode("utf-8",'strict')) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.