11

given a character like "" (\xe2\x9c\xae), for example, can be others like "Σ", "д" or "Λ") I want to find the "actual" length that character takes when printed onscreen

for example

len("✮") len("\xe2\x9c\xae") 

both return 3, but it should be 1

4
  • 2
    Try: len("✮".decode("utf-8")) Commented Apr 29, 2014 at 12:49
  • Won't that depend on the font used and also what characters surround it - what is the overall thing you are trying to do? Commented Apr 29, 2014 at 12:51
  • len("\xe2\x9c\xae".decode('UTF-8')) works perfectly in python2.7.5. Commented Apr 29, 2014 at 13:01
  • 2
    There are several ways to define length (and width) here. It would help to know what you want this for: for instance, are you trying to work out how many characters will fit in a row on the screen? Commented Apr 29, 2014 at 14:55

2 Answers 2

2

You may try like this:

unicodedata.normalize('NFC', u'✮') len(u"✮") 

UTF-8 is an unicode encoding which uses more than one byte for special characters. Check unicodedata.normalize()

Sign up to request clarification or add additional context in comments.

3 Comments

Even this doesn't necessarily count user-perceived characters or grapheme clusters; some uses of diacritics don't have a single-code-point representation. I also don't see how UTF-8 (specifically) enters the picture?
this also return len(unicodedata.normalize('NFC', u'✮')) = 3
Even without diacritics, some code points map to no glyph at all (think about control characters, word joiners, soft hyphens and so on). No amount of normalization will get you rid of these. (Back on topic: u'✮' is already in normal form so normalization is a no-op here; the OP’s actual problem was with the UTF-8 encoding being multibyte; hopefully as of 2022 we are all using Python 3 and len() correctly counts code points, rather than bytes.)
0

My answer to a similar question:

You are looking for the rendering width from the current output context. For graphical UIs, there is usually a method to directly query this information; for text environments, all you can do is guess what a conformant rendering engine would probably do, and hope that the actual engine matches your expectations.

4 Comments

Rendering width in pixels is another topic. I can't see that this has been asked.
For monospaced text output, the standard glyph width is the smallest addressable unit, and we are interested in multiples of that unit -- that is not so different from pixel width.
This question has nothing to do with rendering.
In what way is "when printed onscreen" not related to rendering?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.