0

I expect the following code works fine, but it's failing, what is the reason?

>>> s = 'ö' >>> s.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x94 in position 0: invalid start byte 
1
  • 1
    Your string is not encoded in utf-8. What console are you using? Commented Feb 22, 2017 at 10:13

1 Answer 1

3

In the interactive interpreter, the encoding of a string literal depends entirely on your terminal or console configuration. In your case, that is not set to UTF-8.

You can use the sys.stdin.encoding attribute to determine what codec to use:

>>> s = 'ö' >>> import sys >>> s.decode(sys.stdin.encoding) u'\xf6' 

Alternatively, just create a unicode string literal (using the u prefix) directly; the Python interactive interpreter knows to use the sys.stdin.encoding codec for that case:

>>> s = u'ö' >>> s u'\xf6' 
Sign up to request clarification or add additional context in comments.

2 Comments

locale.getpreferredencoding() and sys.stdin.encoding don't necessarily return the same value. On a US Windows console the former returns cp1252 and the latter returns cp437. In the console the latter is correct.
@MarkTolonen: ah, indeed; the win32 API GetConsoleCP function is used to determine the input codepage, which is not exposed in Python.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.