6
{u'Status': u'OK', u'City': u'Ciri\xe8', u'TimezoneName': '', u'ZipPostalCode': '', u'CountryCode': u'IT', u'Dstoffset': u'0', u'Ip': u'x.x.x.x', u'Longitude': u'7.6', u'CountryName': u'Italy', u'RegionCode': u'12', u'Latitude': u'45.2333', u'Isdst': '', u'Gmtoffset': u'0', u'RegionName': u'Piemonte'} 

This is the output of my object. I would like to access City but It's encoded. How can I read all parameters and decode it

>>> data['City'] u'Ciri\xe8' >>>data['City'].decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 4: ordinal not in range(128) 

I want plaintext not unicode string. Thank you!

5
  • I'm using this code github.com/sonicrules1234/pyipinfodb/blob/master/pyipinfodb.py Commented Apr 22, 2012 at 2:04
  • 1
    There is no such thing as "plaintext". Commented Apr 22, 2012 at 2:07
  • 2
    You don't have to do anything. It's already decoded... Try print data['City'] Commented Apr 22, 2012 at 2:07
  • As you see in the post the result of print data['City'] is u'Ciri\xe8' Commented Apr 22, 2012 at 2:10
  • 1
    No, you just typed data['City']. Try print data['City']. For me, in iPython, that makes a difference. Commented Apr 22, 2012 at 2:11

3 Answers 3

9

What you want is not clear. If by 'plaintext' you mean remove accentuation, try this:

>>> s = u'Ciri\xe8' >>> from unicodedata import normalize >>> normalize('NFKD', s).encode('ASCII', 'ignore') 'Cirie' 
Sign up to request clarification or add additional context in comments.

Comments

8

Read this: http://nedbatchelder.com/text/unipain.html

Then just print it:

>>> data = {u'City':u'Ciri\xe8'} >>> data['City'] u'Ciri\xe8' >>> print data['City'] Ciriè 

If you don't print it, Python prints a safe representation of the string, indicating it is Unicode text u'', and that it contains a non-ASCII character \xe8. print attempts to display the non-ASCII character by encoding the Unicode string in the terminal encoding. It may fail if the string contains characters that aren't supported by the terminal encoding:

>>> print u'\xe8' è >>> print u'\x81' Traceback (most recent call last): File "<stdin>", line 1, in <module> File "d:\dev\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\x81' in position 0: character maps to <undefined> 

In the above example, code page 437 supports Unicode character U+00E8, but not U+0081.

Comments

0

By plaintext, I suppose you mean ascii. For this you can use:

data['City'].encode('ascii','ignore') 

this will strip the unicode character and return

Ciri 

See this link for more information: http://docs.python.org/howto/unicode.html

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.