Python UnicodeDecodeError: ascii vs utf-8

Question

Why the following code is still use "ascii" to decode the string. Didn't I tell python to use "utf-8" to decode the string? Plus, how come ignore did not work?

print data.encode('utf-8', 'ignore')

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 12355:

How you explicitly tell Python to handle a string does not affect what it does implicitly in order to print it. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Mar 5, 2015 at 20:26
@IgnacioVazquez-Abrams I dont think its the print ... see my answer (I think its right... I dunno string encodings sometimes hang me up too) — Joran Beasley
– Joran Beasley, Commented Mar 5, 2015 at 20:41

Joran Beasley · Accepted Answer · 2015-03-05 20:36:20Z

I assume data is a str

print isinstance(data,str)

should probably tell you true

encode wants a unicode so first it tries to decode your str to unicode using the ascii codec

hence why you get the UnicodeDecodeError not UnicodeEncodeError

try

print data.decode("utf-8","ignore")

And hence why Python 3 won't automatically convert byte strings to Unicode strings - you'll get a completely different error that is much easier to interpret.

Collectives™ on Stack Overflow

Python UnicodeDecodeError: ascii vs utf-8

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related