How to make python 3 print() utf8

Question

How can I make python 3 (3.1) print("Some text") to stdout in UTF-8, or how to output raw bytes?

Test.py

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8 TestText2 = b"Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd" # just bytes print(sys.getdefaultencoding()) print(sys.stdout.encoding) print(TestText) print(TestText.encode("utf8")) print(TestText.encode("cp1252","replace")) print(TestText2)

Output (in CP1257 and I replaced chars to byte values [x00]):

utf-8 cp1257 Test - [xE2][xC2][xE7][C7][xE8][xC8]..[xF0][xD0][xFB][xDB][xFE][xDE] b'Test - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd' b'Test - ??????..\x9a\x8a??\x9e\x8e' b'Test2 - \xc4\x81\xc4\x80\xc4\x93\xc4\x92\xc4\x8d\xc4\x8c..\xc5\xa1\xc5\xa0\xc5\xab\xc5\xaa\xc5\xbe\xc5\xbd'

print is just too smart... :D There's no point using encoded text with print (since it always show only representation of bytes not real bytes) and it's impossible to output bytes at all, because print anyway and always encodes it in sys.stdout.encoding.

For example: print(chr(255)) throws an error:

Traceback (most recent call last): File "Test.py", line 1, in <module> print(chr(255)); File "H:\Python31\lib\encodings\cp1257.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xff' in position 0: character maps to <undefined>

By the way print( TestText == TestText2.decode("utf8")) returns False, although print output is the same.

How does Python 3 determine sys.stdout.encoding and how can I change it?

I made a printRAW() function which works fine (actually it encodes output to UTF-8, so really it's not raw...):

 def printRAW(*Text): RAWOut = open(1, 'w', encoding='utf8', closefd=False) print(*Text, file=RAWOut) RAWOut.flush() RAWOut.close() printRAW("Cool", TestText)

Output (now it print in UTF-8):

Cool Test - āĀēĒčČ..šŠūŪžŽ

printRAW(chr(252)) also nicely prints ü (in UTF-8, [xC3][xBC]) and without errors :)

Now I'm looking for maybe better solution if there's any...

TestText starts with "Test" and TestText2 starts with "Test2" so they wouldn't compare equal :D — Philippe Carphin
– Philippe Carphin, Commented Jun 3, 2022 at 18:26

Mark Tolonen · Accepted Answer · 2020-01-25 22:25:47Z

69

Clarification:

TestText = "Test - āĀēĒčČ..šŠūŪžŽ" # this not UTF-8...it is a Unicode string in Python 3.X. TestText2 = TestText.encode('utf8') # this is a UTF-8-encoded byte string.

To send UTF-8 to stdout regardless of the console's encoding, use the its buffer interface, which accepts bytes:

import sys sys.stdout.buffer.write(TestText2)

edited Jan 25, 2020 at 22:25

answered Aug 30, 2010 at 18:31

Mark Tolonen

181k26 gold badges182 silver badges278 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

davispuh Over a year ago

thanks :) by the way when I said: "Test - āĀēĒčČ..šŠūŪžŽ" # this is UTF-8 I mean that string is written in UTF-8 with IDE, py file is encoded UTF-8 and when python parses file it converts string to Python unicode...

o17t H1H' S'k Over a year ago

i get: Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: '_ReplOutput' object has no attribute 'buffer'

Mark Tolonen Over a year ago

Python 3? We're you using an IDE? _ReplOutput sounds like stdout was replaced with an (incorrect) file-like object.

Van Jone Over a year ago

(ok, despite struggling I can't post multiline error msg here) Hmm... >>> sys.stdout.buffer().write(chr(255)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: '_io.BufferedWriter' object is not callable >>> sys.stdout.buffer.write(chr(252)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'str' does not support the buffer interface Python 3.2.2

Mark Tolonen Over a year ago

@VanJone, post a new question.

|

zwol · Accepted Answer · 2010-08-30 04:20:19Z

17

This is the best I can dope out from the manual, and it's a bit of a dirty hack:

utf8stdout = open(1, 'w', encoding='utf-8', closefd=False) # fd 1 is stdout print(whatever, file=utf8stdout)

It seems like file objects should have a method to change their encoding, but AFAICT there isn't one.

If you write to utf8stdout and then write to sys.stdout without calling utf8stdout.flush() first, or vice versa, bad things may happen.

answered Aug 30, 2010 at 4:20

zwol

142k39 gold badges272 silver badges388 bronze badges

3 Comments

iljau Over a year ago

Had issue on windows, where cp1257 was used for printing (and failed), while I wanted utf-8. Following snippet worked:

import sys; sys.stdout = open(1, 'w', encoding='utf-8', closefd=False); print("vadsэавфыаЭХÜÜÄ"); print(bytes("аЭХÜ", "utf-8"))

u936293 Over a year ago

@zwol and all: what is the rationale that the Python 3 print function was defined and designed not to handle Unicode?

zwol Over a year ago

@OldGeezer That's not correct. It was defined and designed to handle Unicode. But the interpreter thinks, for some reason that we'll probably never know, that sys.stdout is feeding to a terminal emulator that doesn't handle Unicode, only CP1257, and therefore print (actually sys.stdout.write) must convert from Unicode to CP1257 before printing, and any character not in the CP1257 repertoire can't be printed at all (unless it is escaped first, which print won't do for you).

CervEd · Accepted Answer · 2021-05-20 19:27:38Z

As per this answer

You can manually reconfigure the encoding of stdout as of python 3.7

import sys sys.stdout.reconfigure(encoding='utf-8')

Andreas Haferburg · Accepted Answer · 2021-06-16 17:53:18Z

I tried zwol's solution in Python 3.6, but it didn't work for me. With some strings there was no output printed to the console.

But iljau's solution worked: Reopen stdout with a different encoding.

import sys sys.stdout = open(1, 'w', encoding='utf-8', closefd=False)

jumorap · Accepted Answer · 2022-02-25 23:35:31Z

You can set the console encoding at utf-8 with:

import sys sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)

Collectives™ on Stack Overflow

How to make python 3 print() utf8

Test.py

5 Answers 5

9 Comments

3 Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Test.py

5 Answers 5

9 Comments

3 Comments

Comments

Comments

Comments

Linked

Related