1

Current code:

 file.write("\"" + key + "\": " + "\"" + french[key].encode('utf8') + "\"" + ',' + '\n') 

where french key values in dictionary look like this:

"YOU_HAVE_COMPLETED_ENROLLMENT": "Vous avez termin\u00e9 l'inscription !" 

Getting this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 19: ordinal not in range(128) 

Tried all the solutions on here but none seem to work.

8
  • same error it looks like still Commented Oct 4, 2017 at 18:23
  • Possible duplicate of python encoding utf-8 Commented Oct 4, 2017 at 18:23
  • looked at that thread and it says to just write it directly without encoding. However, if i remove .encode, it gives me an encode error: 'ascii' codec can't encode character u'\xe9' in position 37: ordinal not in range(128) Commented Oct 4, 2017 at 18:26
  • stackoverflow.com/questions/18403898/… Commented Oct 4, 2017 at 18:30
  • @Justin. (1) Are you using python2 or python 3? (2) what is the output of print(type(french[key])) (3) what is file, and how did you create it? Commented Oct 4, 2017 at 18:59

2 Answers 2

1

you could unicode string using this function

def _parse_value(value): if type(value) == str: value = value.decode("utf-8", "ignore").strip() return value 
Sign up to request clarification or add additional context in comments.

Comments

1

The solution: Concatenate unicode strings before encoding, then encode the complete string just before writing to a file. The codecs library simplifies this for you.

import codecs file = codecs.open(os.path.join(fr_directory, 'strings.json'), 'w+', encoding='utf8') file.write("\"" + key + "\": " + "\"" + french[key] + "\"" + ',' + '\n') 

I have opened the file with codecs.open rather than just open, specifying that the file should automatically handle encoding into UTF-8 when you write unicode strings. I have also removed the explicit encoding call you used.

Further explanation:

The keys and values of your dictionary are almost certainly Unicode strings. A "Unicode string" needs to be encoded before it can be written to a file. Most operations in Python 2 assume an ASCII encoding unless told otherwise, and the file objects returned by open are among them. That's why, if you try to write a Unicode string to a file, you'll see an exception:

>>> with open('/tmp/test.txt', 'w') as f: ... f.write(u"Vous avez termin\xe9 l'inscription !") UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 16: ordinal not in range(128) 

This error is one that you can fix by encoding the string directly, so this works:

>>> with open('/tmp/test.txt', 'w') as f: ... f.write(u"Vous avez termin\xe9 l'inscription !".encode('utf-8')) 

However, this alone does not solve your problem, because you are trying to build a more complicated string. When you concatenate a Unicode string to a UTF-8 encoded "raw" string, you also get an exception, even when not writing to a file:

>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !".encode('utf-8') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128) 

You can fix this by not encoding either string:

>>> u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !" u"YOU_HAVE_COMPLETED_ENROLLMENT: Vous avez termin\xe9 l'inscription !" 

But then when you want to write it to a file, you would have to encode the whole thing again:

>>> with open('/tmp/test.txt', 'w') as f: ... line = u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !" ... f.write(line.encode('utf-8')) 

But for convenience, the codecs module gives you the tools to not always have to re-encode every time:

>>> import codecs >>> with codecs.open('/tmp/test.txt', 'w', encoding='utf8') as f: ... f.write(u"YOU_HAVE_COMPLETED_ENROLLMENT: " + u"Vous avez termin\xe9 l'inscription !") 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.