0

I have a list of keywords
keywords = [u'encendió', u'polémica']

I am trying to load them to a django model:

class myKeywords(model.Model): keyword = models.charField() def __unicode__(self): return self.keyword.encode('utf-8') 

This is what i am trying:

for k in keywords: keyObj, created = myKeywords.objects.get_or_create(keyword=k.decode('utf-8')) print created, keyObj 

However, I keep getting the django.utils.encoding.DjangoUnicodeDecodeError: 'ascii' codec can't decode byte.

I have tried:

  1. adding/removing u from infront of the keyword
  2. removing decode('utf-8') while creating the keyword object -- doing this successfully creates and saves the object if there is a u appended infront of the keyword
  3. removing encode('utf-8') from the __unicode__(self) function. -- doing this successfully prints the keyword

So, the only configuration that is working is as follows:

  1. keep u appended in-front of the keyword
  2. dont do decode('utf-8') or encode('utf-8') anyplace else

But I am not sure if this is the right way of doing this. Ideally I should be reading a keyword and decoding it as utf-8 and then be saving it to the db. Any suggestions?

2 Answers 2

3

The __unicode__ method should return a unicode string, not a byte string. Therefore you should remove the encode() from your __unicode__ method.

If your keywords have the u'' prefix, then they are unicode strings as well, and don't have to be decoded either.

Sign up to request clarification or add additional context in comments.

8 Comments

If you want to remove the u to make your code cleaner, then you might want to use from __future__ import unicode_literals. Otherwise, removing the u isn't a good idea. Unicode bugs occur when converting between unicode and byte strings. If you only deal with unicode strings, then you avoid those problems.
Yes, encode is why you're getting that specific error. __unicode__ should (unsurprisingly) return a Unicode object, not encoded bytes.
Yes, you should remove the encode() from the unicode method, for the reason that Peter gives.
In general, you should expect to handle the decoding from bytes to Unicode when taking data from an external source, like a file (best handled with codecs.open on Python 2, Python 3 has better tools) or a web request. Django handles it for you when processing the web request, though, so it's rare that you need to do this at all in Django.
The error specifically comes from mixing encoded bytestrings with Unicode. encode on a Unicode object is perfectly valid depending on what you want, but here it was causing your __unicode__ method to return an object of the wrong type.
|
1

You don't need to encode() the strings to utf-8 in __unicode__() method as Django returns all the strings from the database as unicode.

From docs,

Because all strings are returned from the database as Unicode strings, model fields that are character based (CharField, TextField, URLField, etc) will contain Unicode values when Django retrieves data from the database. This is always the case, even if the data could fit into an ASCII bytestring.

Since your keywords are already unicode strings(as prefixed by 'u'), you don't need to do decode() while printing. Remove the decode() also.

Your code should look like:

models.py

class myKeywords(model.Model): keyword = models.charField() def __unicode__(self): return u'%s'%(self.keyword) keywords = [u'encendió', u'polémica'] for k in keywords: keyObj, created = myKeywords.objects.get_or_create(keyword=k) print created, keyObj 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.