Python: 'ascii' codec can't decode byte

Question

Current code:

 file.write("\"" + key + "\": " + "\"" + french[key].encode('utf8') + "\"" + ',' + '\n')

where french key values in dictionary look like this:

"YOU_HAVE_COMPLETED_ENROLLMENT": "Vous avez termin\u00e9 l'inscription !"

Getting this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128)

Tried all the solutions on here but none seem to work.

looked at that thread and it says to just write it directly without encoding. However, if i remove .encode, it gives me an encode error: 'ascii' codec can't encode character u'\xe9' in position 37: ordinal not in range(128) — Justin
– Justin, Commented Oct 4, 2017 at 18:26
@Justin. (1) Are you using python2 or python 3? (2) what is the output of print(type(french[key])) (3) what is file, and how did you create it? — ekhumoro
– ekhumoro, Commented Oct 4, 2017 at 18:59

rachid · Accepted Answer · 2017-10-04 20:35:56Z

you could unicode string using this function

def _parse_value(value): if type(value) == str: value = value.decode("utf-8", "ignore").strip() return value

user108471 · Accepted Answer · 2017-10-04 22:14:21Z

The solution: Concatenate unicode strings before encoding, then encode the complete string just before writing to a file. The codecs library simplifies this for you.

import codecs file = codecs.open(os.path.join(fr_directory, 'strings.json'), 'w+', encoding='utf8') file.write("\"" + key + "\": " + "\"" + french[key] + "\"" + ',' + '\n')

I have opened the file with codecs.open rather than just open, specifying that the file should automatically handle encoding into UTF-8 when you write unicode strings. I have also removed the explicit encoding call you used.

Further explanation:

The keys and values of your dictionary are almost certainly Unicode strings. A "Unicode string" needs to be encoded before it can be written to a file. Most operations in Python 2 assume an ASCII encoding unless told otherwise, and the file objects returned by open are among them. That's why, if you try to write a Unicode string to a file, you'll see an exception:

>>> with open('/tmp/test.txt', 'w') as f: ... f.write(u"Vous avez termin\xe9 l'inscription !") UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128)

This error is one that you can fix by encoding the string directly, so this works:

>>> with open('/tmp/test.txt', 'w') as f: ... f.write(u"Vous avez termin\xe9 l'inscription !".encode('utf-8'))

However, this alone does not solve your problem, because you are trying to build a more complicated string. When you concatenate a Unicode string to a UTF-8 encoded "raw" string, you also get an exception, even when not writing to a file:

>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !".encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

You can fix this by not encoding either string:

>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !" u"YOU_HAVE_COMPLETED_ENROLLMENT: Vous avez termin\xe9 l'inscription !"

But then when you want to write it to a file, you would have to encode the whole thing again:

>>> with open('/tmp/test.txt', 'w') as f: ... line = u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !" ... f.write(line.encode('utf-8'))

But for convenience, the codecs module gives you the tools to not always have to re-encode every time:

>>> import codecs >>> with codecs.open('/tmp/test.txt', 'w', encoding='utf8') as f: ... f.write(u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !")

Collectives™ on Stack Overflow

Python: 'ascii' codec can't decode byte

2 Answers 2

Comments

Further explanation:

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Further explanation:

Comments

Linked

Related