Displaying UTF8 stings in Ubuntu's terminal with a Python script

Question

In my Python script running at the command line on Ubuntu, it's selecting UTF8-encoded content from a MySQL database.

Then, I want to print the string to the console.

The displayed strings have an encoding problem, as they don't show the accented characters correctly. How do I fix this?

Preferably, the script would make the decision, rather than setting a system environment, to ensure easy running on other systems.

Are you sure your locale settings match what the terminal actually does? — hmakholm left over Monica
– hmakholm left over Monica, Commented Aug 11, 2011 at 18:24
Don’t ever rely on terminal settings. Set all the encoding stuff to UTF-8 and banish all those heisenbugs. — tchrist
– tchrist, Commented Aug 11, 2011 at 22:49

tchrist · Accepted Answer · 2011-08-11 22:52:32Z

It is very strongly recommended that you not use "?" as a replacement char. Just set your output encoding to UTF-8 and be done with it.

for s in ("stdin","stdout","stderr"): setattr(sys, s, io.TextIOWrapper(getattr(sys, s).detach(), encoding="utf8"))

Alternately, set your PYTHONIOENCODING envariable to utf8 so that python stops guessing about the output encoding.

Either approach is infinitely much better than manually encoding, which is stupid.

If you refuse to upgrade to Python3, I also recommend

from __future__ import unicode_literals

to banish all that stupid u'...' stuff.

Lately I’ve starting all my Python progams like this:

#!/usr/bin/env python3.2 # -*- coding: UTF-8 -*- from __future__ import print_function from __future__ import unicode_literals import re import sys import os if not (("PYTHONIOENCODING" in os.environ) and re.search("^utf-?8$", os.environ["PYTHONIOENCODING"], re.I)): sys.stderr.write(sys.argv[0] + ": Please set your PYTHONIOENCODING envariable to utf8\n") sys.exit(1) import unicodedata if unicodedata.unidata_version < "6.0.0": print("WARNING: Your old UCD is out of date, expected at least 6.0.0 but got", unicodedata.unidata_version) wide_enough = (sys.maxunicode >= 0x10FFFF) if not wide_enough: print("WARNING: Narrow build detected, your Python lacks full Unicode support!!")

Thanks. In the edit I used: import re import sys import os if not (("PYTHONIOENCODING" in os.environ) and re.search("^utf-?8$", os.environ["PYTHONIOENCODING"], re.I)): sys.stderr.write(sys.argv[0] + ": Please set your PYTHONIOENCODING envariable to utf8\n") sys.exit(1) There were import dependencies I couldn't work out to get your first bit of code running.

Boldewyn · Accepted Answer · 2011-08-11 18:53:39Z

You can get the current encoding of STDOUT like this:

>>> import sys >>> sys.stdout.encoding UTF-8

Then encode your Unicode string accordingly:

>>> u"Ä" u'\xc4' >>> sys.stdout.write(u"Ä".encode(enc, 'replace'))

The 'replace' avoids UnicodeEncodeErrors, when a character is not representable in the terminal's encoding. This will then be replaced with a question mark.

jfs · Accepted Answer · 2015-10-26 11:11:34Z

The input encoding of the text (utf-8 here) does not matter. You should convert utf8-bytestring into Unicode as soon as possible then to print the text:

print(unicode_text)

do not encode the text into utf8 before printing
do not modify sys.stdout to encode the text using utf8 for you

The output encoding is specified by the environment that runs your script based on locale settings (LANG, LC_CTYPE, LC_ALL) or PYTHONIOENCODING envvar. Do not output utf8 unconditionally.

For example, if the locale is not set; you could specify it explicitly:

$ LANG=en_US.utf8 python your_script.py

Make sure your terminal is capable of showing the corresponding Unicode characters (fonts, the corresponding locales (locale -a) are installed).

In other words, to fix the output, fix the environment e.g., configure your locale settings to use C.UTF-8 by default.

Collectives™ on Stack Overflow

Displaying UTF8 stings in Ubuntu's terminal with a Python script

3 Answers 3

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Linked

Related