0

Hey I am having this major issue with encoding in python. I am not too familiar with python and have been stuck on this bug for weeks. I feel like I've tried every possible thing but I can't seem to get it.

I am reading files in to work with and am getting the following error on some files that have Chinese charaters.

 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 154, in reviewrequest_recent_cc prev_reviewrequest_ccdata = _reviewrequest_recent_cc(request, review_request_id, False, revision_offset=1) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 140, in _reviewrequest_recent_cc filename, comparison_data = _download_comparison_data(request, review_request_id, revision, filediff_id, modified) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 89, in _download_comparison_data revision, filediff_id, local_site, modified) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 68, in _download_analysis temp_file.write(working_file) UnicodeEncodeError: 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128) 

My code in this area looks this this:

working_file = get_original_file(filediff, request, encoding_list) if modified: working_file = get_patched_file(working_file, filediff, request) working_file = convert_to_unicode(working_file, encoding_list)[1] logging.debug("Encoding List: %s", encoding_list) logging.debug("Source File: " + filediff.source_file) temp_file_name = "cctempfile_" + filediff.source_file.replace("/","_") logging.debug("temp_file_name: " + temp_file_name) source_file = os.path.join(HOMEFOLDER, temp_file_name) logging.debug("File contents" + working_file) #temp_file = codecs.open(source_file, encoding='utf-8') #temp_file.write(working_file.encode('utf-8')) temp_file = open(source_file, 'w') temp_file.write(working_file) temp_file.close() 

Notice the commented out lines. Working file is never empty. The encoding from the logged "encoding list" is

Encoding List: [u'iso-8859-15'] 

Anything to help would be soooo appreciated. I have to take a break from this after 8 straight hours of debugging this + the previous two weeks.

2 Answers 2

1

The error indicates working_file is a Unicode string, but is being written to a file that was opened to expect a byte string. Python 2 uses the default ascii codec to implicitly convert the Unicode string to a byte string, and non-ASCII characters trigger the UnicodeEncodeError.

The commented lines are close to correct, but the write will expect Unicode strings with codecs.open, so no need to explicitly encode, and the file needs to be opened for writing:

temp_file = codecs.open(source_file, 'w', encoding='utf-8') temp_file.write(working_file) 
Sign up to request clarification or add additional context in comments.

1 Comment

Shouldn't we be suggesting io.open for its proper newline support?
0

What is the return type of your convert_to_unicode function?

If it is bytes, you probably should change temp_file = open(source_file, 'w') to temp_file = open(source_file, 'wb'), which means writing bytes into file.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.