I'm using Python 3 (recently switched from Python 2). My code usually runs on Linux but also sometimes (not often) on Windows. According to Python 3 documentation for open(), the default encoding for a text file is from locale.getpreferredencoding() if the encoding arg is not supplied. I want this default value to be utf-8 for a project of mine, no matter what OS it's running on (currently, it's always UTF-8 for Linux, but not for Windows). The project has many many calls to open() and I don't want to add encoding='utf-8' to all of them. Thus, I want to change the locale's preferred encoding in Windows, as Python 3 sees it.
I found a previous question "Changing the "locale preferred encoding"", which has an accepted answer, so I thought I was good to go. But unfortunately, neither of the suggested commands in that answer and its first comment work for me in Windows. Specifically, that accepted answer and its first comment suggest running chcp 65001 and set PYTHONIOENCODING=UTF-8, and I've tried both. Please see transcript below from my cmd window:
> py -i Python 3.4.3 ... >>> f = open('foo.txt', 'w') >>> f.encoding 'cp1252' >>> exit() > chcp 65001 Active code page: 65001 > py -i Python 3.4.3 ... >>> f = open('foo.txt', 'w') >>> f.encoding 'cp1252' >>> exit() > set PYTHONIOENCODING=UTF-8 > py -i Python 3.4.3 ... >>> f = open('foo.txt', 'w') >>> f.encoding 'cp1252' >>> exit() Note that even after both suggested commands, my opened file's encoding is still cp1252 instead of the intended utf-8.
chcp 65001. The Windows console does not properly support UTF-8, and it's not doing what you want anyway.locale.getpreferredencodinghas nothing to do with the console codepage; it's based on the Windows locale's ANSI encoding. For example, if you call Win32CreateFileA(ANSI) instead ofCreateFileW(UTF-16), the file path string gets decoded as an ANSI string (e.g. Windows-1252). Windows does not allow UTF-8 to be used as the ANSI character set, and the C runtime also doesn't allow using UTF-8 for a locale.wchar_tstrings. The Windows API only supports 8-bit encodings for the legacy ANSI API, which unfortunately does not allow UTF-8. Python's preferred encoding is simply callingGetACPto get the ANSI codepage. I sympathize with you and wish thatio.TextIOWrapperdefaulted to UTF-8 on all platforms (your assumption about Linux isn't always valid, either). As things stand you need a wrapper function, as previously suggested.TextIOWrappersource and therein to see that_Py_device_encodingis what uses the Windows console codepage (GetConsoleCP), but only for stdin, stdout, and stderr. Otherwise it callsgetpreferredencoding, which calls_getdefaultlocaleand thusGetACP.