0

I'm getting an encoding error from a script, as follows:

from django.template import loader, Context t = loader.get_template(filename) c = Context({'menus': menus}) print t.render(c) File "../django_to_html.py", line 45, in <module> print t.render(c) UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 34935: ordinal not in range(128) 

I don't own the script, so I don't have the ability to edit it. The only thing I can do is change the filename supplied so it doesn't contain the Unicode character to which the script is objecting.

This file is a text file that I'm editing in TextMate. What can I do to identify and get rid of the character that the script is barfing on?

Could I use something like iconv, and if so how?

Thanks!

4 Answers 4

3

How to find ALL the nasties in your file:

import unicodedata as ucd import sys with open(sys.argv[1]) as f: for linex, line in enumerate(f): uline = line.decode('UTF-8') bad_line = False for charx, char in enumerate(uline): if char <= u'\xff': continue print "line %d, column %d: %s" % ( linex+1, charx+1, ucd.name(char, '<unknown>')) bad_line = True if bad_line: print repr(uline) print 

Sample output:

line 1, column 6: RIGHT SINGLE QUOTATION MARK line 1, column 10: SINGLE LOW-9 QUOTATION MARK u'yadda\u2019foo\u201abar\r\n' line 2, column 4: IDEOGRAPHIC SPACE u'fat\u3000space\r\n' 
Sign up to request clarification or add additional context in comments.

Comments

2

I don't know why you're using Django's template engine to create console output, but the Python wiki shows a way to work around this on Windows using a Python-specific environment variable:

set PYTHONIOENCODING=utf_8

This will set stdout/stderr encoding to UTF-8, meaning you can print all Unicode characters. As the command line encoding in Windows is usually not UTF-8, you'll see a UTF-like sequence printed instead of special characters. For example:

>>> print u'\u2019' ΓÇÖ 

2 Comments

I'm not on Windows unfortunately, I'm on OSX.
@AP257: I don't think that makes a difference. Your problem stays the same - and setting env variables should be possible in Mac OSX, too?!
1

The character is in position 34935 in the file. The helpful traceback tells you that.

1 Comment

Actually it's the position in the rendered output, not in the template file. But that should help, too.
0

\u2019 is a right single quotation mark (http://www.unicode.org/charts/ has a helpful search box where you can enter the code), maybe that'll help track it down. If your file ends up in HTML again, you could maybe use the ’ notation for these characters. (As John points out, this accepts hex notation.)

2 Comments

No need to convert; use &#x2019
@John: Cheeers, hadn't come across that one!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.