3

I cannot display theunicode item u'\u201d'. I didn't have problems with other unicode items. I used UTF-8, but then this character shows up and rained hell on my code. I tried different things in the interpreter. But basically where:

c = u'\u201d' 

I get this error:

Traceback (most recent call last): File "<pyshell#154>", line 1, in <module> c.decode('utf-32') File "C:\Python27\lib\encodings\utf_32.py", line 11, in decode return codecs.utf_32_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 0: ordinal not in range(128) 

I need to display it in the GUI so I can check the output and then store it as plain text. Transform unicode string in python explains a bit, however I am still clearly missing something.

5
  • What is c in c.decode('utf-32')? Commented Sep 22, 2012 at 18:55
  • the value i mentioned u'\u201d' Commented Sep 22, 2012 at 18:56
  • So this question is an exact duplicate of the question stackoverflow.com/questions/12545843/… asked by yourself 2 hours ago. Commented Sep 22, 2012 at 19:04
  • yes a person in comments said it got off topic and asked for reposting.... i am fairly frustrated with this right now so i ended up doing it, once i resolve the issue i will delete least useful one Commented Sep 22, 2012 at 19:09
  • You haven't defined what you mean by "GUI" and you haven't told us which OS+application is going to open the resulting text file. They both make a difference. Commented Sep 22, 2012 at 20:21

2 Answers 2

7

If you're getting this exception, then you're trying to call .decode() on a unicode string. You should only call .decode() on a byte string, and only call .encode() on a unicode string. Otherwise, the interpreter will first implicitly encode or decode the string using the default codec (usually 'ascii'), which is bad news.

In general, I recommend reading http://farmdev.com/talks/unicode/ carefully...

Sign up to request clarification or add additional context in comments.

3 Comments

If I encode u'\u201d'.encode('utf-32') I get: '\xff\xfe\x00\x00\x1d \x00\x00' I need to convert that symbol into plain text for GUI and save it as txt
define “plain text”. There is no such thing as “plain text” in that context. I suggest reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
i've read that file, it is useful but I am still confused how to go about my problem. All I need is to convert anything that is not human readable into readable format so I can test it for certain conditions and then write it to a file. I am not a professional coder in any sense. I simply need clean output, thats all
2

If you had read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) you would know There Ain't No Such Thing As Plain Text..

But since there doesn't seem to be a meeting of the minds between what you insist you're after and what people are trying to explain, I'm starting to wonder if by "convert that symbol into plain text" you mean something like "replace the Unicode RIGHT DOUBLE QUOTATION MARK (U+201D) with QUOTATION MARK (U+0022) and then encode as ASCII". For example, something like:

In [45]: s = u"“curly quoted”" In [46]: s Out[46]: u'\u201ccurly quoted\u201d' In [47]: print s “curly quoted” 

and then doing the replacements manually (search for "unicode string sanitize" and you'll find much better recipes including more "downgrades" for different characters):

In [51]: fixer = dict.fromkeys([0x201c, 0x201d], u'"') In [52]: s.translate(fixer) Out[52]: u'"curly quoted"' In [53]: s.translate(fixer).encode("ascii", "replace") Out[53]: '"curly quoted"' 

where the "replace" would protect against anything we didn't fix.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.