3

I've tried no less then 5 different "solutions" and i cant get it to work, please help.

This is the error

 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128) Traceback (most recent call last): File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 636, in __call__ handler.post(*groups) File "/base/data/home/apps/elmovieplace/1.350096827241428223/script/pftv.py", line 114, in post movie.put() File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 984, in put return datastore.Put(self._entity, config=config) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 455, in Put return _GetConnection().async_put(config, entities, extra_hook).get_result() File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1219, in async_put for pbs in pbsgen: File "/base/python_runtime/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1070, in __generate_pb_lists pb = value_to_pb(value) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 239, in entity_to_pb return entity._ToPb() File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 841, in _ToPb properties = datastore_types.ToPropertyPb(name, values) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1672, in ToPropertyPb pbvalue = pack_prop(name, v, pb.mutable_value()) File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore_types.py", line 1485, in PackString pbvalue.set_stringvalue(unicode(value).encode('utf-8')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128) 

This is the part of the code that's giving me problems.

if imdbValues[5] == 'N/A': movie.diector = '' else: movie.director = imdbValues[5] ... movie.put() 

In this case imdbValues[5] is equal to Claudio Fäh

1
  • 2
    You should read this: blog.notdot.net/2010/07/Getting-unicode-right-in-Python . You need to make sure you are clear on when you are dealing with bytes, when you are dealing with strings, and what encoding you should be using to convert between the two. Encoding/Decoding errors such as you see usually occur because of misunderstandings about string handling. Commented May 2, 2011 at 3:40

2 Answers 2

4

The exception is raised by this line of code:

pbvalue.set_stringvalue(unicode(value).encode('utf-8')) 

When you pass a value to movie.director , that value is first converted in unicode with:

unicode(value) 

then it is encoded with encode('utf-8').

The unicode() function tipically uses ASCII as default decode encoding; it means that you are safe only passing these kind of values:

  1. A unicode string
  2. A 8 bit string

Your code is probably passing a byte string with some encoding that the unicode(value) fails to decode in ASCII.

Recommendation:
if you are dealing with byte strings, you MUST know their encoding or your program will suffer this kind of encoding/decoding problem.

How to fix it:
discover the encoding used in the byte strings you are dealing with (utf-8?) and convert them in unicode strings.
If, for example, imdbValues is a list returned by some fancy Imdb python libraries that contains utf-8 encoded byte strings, you should convert them using:

 movie.director = imdbValues[5].decode('utf-8') 
Sign up to request clarification or add additional context in comments.

3 Comments

also if you don't mind answering another question, is there a way to do this with lists.
@Jon try with a list comprehension: unicode_list = [item.decode('utf-8') for item in imdbValues]
I am looking to do the same thing. I have an app engine assignment name='全部'. The string is utf-8 encoded. Tried decode(utf-8) but still got the same error "'ascii' codec can't decode byte"
2

You should start using unicode for your textual data.

Wherever you get your data, they are Unicode characters encoded as bytes. The encoding could be UTF-8, or UTF-16, or Windows-1252, or ISO-8859-1 or many other encodings. If the data exist on your system, you know the encoding. If they came from a web page, the encoding is included in the response headers, and often in the beginning of the page. Using that encoding, .decode to the very useful unicode Python object and use that in your code.

Decode on input, encode (if necessary) on output. It's not necessary to encode before using the data with App Engine.

PS that answer in a Unicode-related question might be of help.

Comments