Python encoding issue, can't seem to figure it out

Question

Hey I am having this major issue with encoding in python. I am not too familiar with python and have been stuck on this bug for weeks. I feel like I've tried every possible thing but I can't seem to get it.

I am reading files in to work with and am getting the following error on some files that have Chinese charaters.

 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 154, in reviewrequest_recent_cc prev_reviewrequest_ccdata = _reviewrequest_recent_cc(request, review_request_id, False, revision_offset=1) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 140, in _reviewrequest_recent_cc filename, comparison_data = _download_comparison_data(request, review_request_id, revision, filediff_id, modified) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 89, in _download_comparison_data revision, filediff_id, local_site, modified) File "/usr/lib/python2.7/site-packages/cc_counter-0.65-py2.7.egg/cc_counter/views.py", line 68, in _download_analysis temp_file.write(working_file) UnicodeEncodeError: 'ascii' codec can't encode characters in position 10314-10316: ordinal not in range(128)

My code in this area looks this this:

working_file = get_original_file(filediff, request, encoding_list) if modified: working_file = get_patched_file(working_file, filediff, request) working_file = convert_to_unicode(working_file, encoding_list)[1] logging.debug("Encoding List: %s", encoding_list) logging.debug("Source File: " + filediff.source_file) temp_file_name = "cctempfile_" + filediff.source_file.replace("/","_") logging.debug("temp_file_name: " + temp_file_name) source_file = os.path.join(HOMEFOLDER, temp_file_name) logging.debug("File contents" + working_file) #temp_file = codecs.open(source_file, encoding='utf-8') #temp_file.write(working_file.encode('utf-8')) temp_file = open(source_file, 'w') temp_file.write(working_file) temp_file.close()

Notice the commented out lines. Working file is never empty. The encoding from the logged "encoding list" is

Encoding List: [u'iso-8859-15']

Anything to help would be soooo appreciated. I have to take a break from this after 8 straight hours of debugging this + the previous two weeks.

Mark Tolonen · Accepted Answer · 2015-12-08 03:07:47Z

The error indicates working_file is a Unicode string, but is being written to a file that was opened to expect a byte string. Python 2 uses the default ascii codec to implicitly convert the Unicode string to a byte string, and non-ASCII characters trigger the UnicodeEncodeError.

The commented lines are close to correct, but the write will expect Unicode strings with codecs.open, so no need to explicitly encode, and the file needs to be opened for writing:

temp_file = codecs.open(source_file, 'w', encoding='utf-8') temp_file.write(working_file)

Shouldn't we be suggesting io.open for its proper newline support?

NameTooLongException · Accepted Answer · 2015-12-08 02:41:28Z

What is the return type of your convert_to_unicode function?

If it is bytes, you probably should change temp_file = open(source_file, 'w') to temp_file = open(source_file, 'wb'), which means writing bytes into file.

Collectives™ on Stack Overflow

Python encoding issue, can't seem to figure it out

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related