2

I'm trying to use some of the simple unicode characters in a command line program I'm writing, but drawing these things into a table becomes difficult because Python appears to be treating single-character symbols as multi-character strings.

For example, if I try to print(u"\u2714".encode("utf-8")) I see the unicode checkmark. However, if I try to add some padding to that character (as one might in tabular structure), Python seems to be interpreting this single-character string as a 3-character one. All three of these lines print the same thing:

print("|{:1}|".format(u"\u2714".encode("utf-8"))) print("|{:2}|".format(u"\u2714".encode("utf-8"))) print("|{:3}|".format(u"\u2714".encode("utf-8"))) 

Now I think I understand why this is happening: it's a multibyte string. My question is, how do I get Python to pad this string appropriately?

1
  • I'm currently working 2.7, but we need to support 3 as well. Commented Oct 25, 2015 at 17:54

2 Answers 2

2

Make your format strings unicode:

from __future__ import print_function print(u"|{:1}|".format(u"\u2714")) print(u"|{:2}|".format(u"\u2714")) print(u"|{:3}|".format(u"\u2714")) 

outputs:

|✔| |✔ | |✔ | 
Sign up to request clarification or add additional context in comments.

3 Comments

The print function is not required for this to work though.
@poke You're correct. OP mentioned in a comment that he was specifically targeting Python 2.7 and 3+ so importing and using unicode_literals, print_function and division are all good practice if not required.
I absolutely agree with that :) My comment was more directed at another comment that has been removed since.
1

Don't encode('utf-8') at that point do it latter:

>>> u"\u2714".encode("utf-8") '\xe2\x9c\x94' 

The UTF-8 encoding is three bytes long. Look at how format works with Unicode strings:

>>> u"|{:1}|".format(u"\u2714") u'|\u2714|' >>> u"|{:2}|".format(u"\u2714") u'|\u2714 |' >>> u"|{:3}|".format(u"\u2714") u'|\u2714 |' 

Tested on Python 2.7.3.

4 Comments

Exactly what I needed! Thank you.
@DanielQuinn: don't encode at all. Print Unicode directly instead. Otherwise, your code may produce a mojibake if the environment uses a different character encoding.
@J.F.Sebastian If I don't encode, Python2.7 explodes with a UnicodeEncodeError. If I do, then Python 3 prints out b'\xe2\x9c\x98'.
@DanielQuinn: If you have issues with printing Unicode then it is a different question (and hard-coding the character encoding is not the answer). Read the link from my previous comment. If you read the linked answer and you have failed to apply the solutions to your case then ask a separate question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.