UTF-32 in Python

Question

I cannot display theunicode item u'\u201d'. I didn't have problems with other unicode items. I used UTF-8, but then this character shows up and rained hell on my code. I tried different things in the interpreter. But basically where:

c = u'\u201d'

I get this error:

Traceback (most recent call last): File "<pyshell#154>", line 1, in <module> c.decode('utf-32') File "C:\Python27\lib\encodings\utf_32.py", line 11, in decode return codecs.utf_32_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 0: ordinal not in range(128)

I need to display it in the GUI so I can check the output and then store it as plain text. Transform unicode string in python explains a bit, however I am still clearly missing something.

So this question is an exact duplicate of the question stackoverflow.com/questions/12545843/… asked by yourself 2 hours ago. — Vicent
– Vicent, Commented Sep 22, 2012 at 19:04
yes a person in comments said it got off topic and asked for reposting.... i am fairly frustrated with this right now so i ended up doing it, once i resolve the issue i will delete least useful one — rodling
– rodling, Commented Sep 22, 2012 at 19:09
You haven't defined what you mean by "GUI" and you haven't told us which OS+application is going to open the resulting text file. They both make a difference. — Mark Ransom
– Mark Ransom, Commented Sep 22, 2012 at 20:21

akgood · Accepted Answer · 2012-09-22 18:56:12Z

7

If you're getting this exception, then you're trying to call .decode() on a unicode string. You should only call .decode() on a byte string, and only call .encode() on a unicode string. Otherwise, the interpreter will first implicitly encode or decode the string using the default codec (usually 'ascii'), which is bad news.

In general, I recommend reading http://farmdev.com/talks/unicode/ carefully...

answered Sep 22, 2012 at 18:56

akgood

1,1178 silver badges4 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

rodling Over a year ago

If I encode u'\u201d'.encode('utf-32') I get: '\xff\xfe\x00\x00\x1d \x00\x00' I need to convert that symbol into plain text for GUI and save it as txt

Jonas Schäfer Over a year ago

define “plain text”. There is no such thing as “plain text” in that context. I suggest reading The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets

rodling Over a year ago

i've read that file, it is useful but I am still confused how to go about my problem. All I need is to convert anything that is not human readable into readable format so I can test it for certain conditions and then write it to a file. I am not a professional coder in any sense. I simply need clean output, thats all

DSM · Accepted Answer · 2012-09-22 19:22:49Z

If you had read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) you would know There Ain't No Such Thing As Plain Text..

But since there doesn't seem to be a meeting of the minds between what you insist you're after and what people are trying to explain, I'm starting to wonder if by "convert that symbol into plain text" you mean something like "replace the Unicode RIGHT DOUBLE QUOTATION MARK (U+201D) with QUOTATION MARK (U+0022) and then encode as ASCII". For example, something like:

In [45]: s = u"“curly quoted”" In [46]: s Out[46]: u'\u201ccurly quoted\u201d' In [47]: print s “curly quoted”

and then doing the replacements manually (search for "unicode string sanitize" and you'll find much better recipes including more "downgrades" for different characters):

In [51]: fixer = dict.fromkeys([0x201c, 0x201d], u'"') In [52]: s.translate(fixer) Out[52]: u'"curly quoted"' In [53]: s.translate(fixer).encode("ascii", "replace") Out[53]: '"curly quoted"'

where the "replace" would protect against anything we didn't fix.

Collectives™ on Stack Overflow

UTF-32 in Python

2 Answers 2

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Linked

Related