Use a Unicode character in a char variable (C++)

Question

I get some input from the command line and want to support Unicode.

This is my error:

And this is my example code:

 #include <iostream> int main() { char test = '█'; } // Characters wanted: █, ▓, or ▒

How can I make my program support Unicode?

Only use Unicode in strings in UTF-8 format. Types like wchar don’t help much because Unicode characters can be multiple code points. 👨‍👨‍👧‍👧👩‍👩‍👦‍👦👨‍👩‍👧‍👦👩‍👩‍👧‍👦 — gnasher729
– gnasher729, Commented Dec 3, 2021 at 19:23
wchar_t works just fine for Unicode, as long as you take into account that wchar_t is different sizes on different platforms (16 bits on Windows, 32 bits on others), so use std::wstring instead of a single wchar_t so you can account for the possibility of needing multiple wchar_ts to encode a single Unicode codepoint, and multiple codepints to encode a single Unicode grapheme. — Remy Lebeau
– Remy Lebeau, Commented Dec 3, 2021 at 19:37

Salvage · Accepted Answer · 2021-12-03 19:24:16Z

4

A char is usually only 1 byte, meaning it won't be able to store most Unicode characters. You should look into using wchar_t which is required to be large enough to hold any supported character codepoint. The associated char literal looks as follows: L'█'.

answered Dec 3, 2021 at 19:24

Salvage

5291 gold badge5 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

gnasher729 Over a year ago

What about L'👨‍👨‍👦‍👦'?

Salvage Over a year ago

👨‍👨‍👦‍👦 is not a single codepoints, but 8 codepoints making up what your browser likely shows as one emoticon, thus it can't be stored in a character type and must be stored in a string.

Remy Lebeau Over a year ago

"wchar_t ... is required to be large enough to hold any supported character codepoint" - that is not the case on Windows, where wchar_t is only 16 bits, so it can't hold Unicode codepoints > U+FFFF, but it can hold UTF-16 code units, which is why Unicode wchar_t strings on Windows are encoded in UTF-16 (previously UCS-2), whereas other platforms can encode wchar_t strings using UTF-32 instead.

eerorika Over a year ago

A char is usually only 1 byte char is always exactly 1 byte.

Salvage Over a year ago

It depends on the definitions you use, as per the standard a byte is defined as an addressable unit of data storage large enough to hold any member of the basic character set of the execution environment and a char is defined as single-byte character <C> bit representation that fits in a byte. However, the common definition of a byte defines it as containing 8 bits, which is not necessarily equivalent to the definition in the standard.

Collectives™ on Stack Overflow

Use a Unicode character in a char variable (C++)

1 Answer 1

5 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Related