1

I get some input from the command line and want to support Unicode.

This is my error:

Repl.it output error

And this is my example code:

 #include <iostream> int main() { char test = '█'; } // Characters wanted: █, ▓, or ▒ 

How can I make my program support Unicode?

3
  • 1
    Char only holds one bit. Try wide char and wcout. Commented Dec 3, 2021 at 19:23
  • 1
    Only use Unicode in strings in UTF-8 format. Types like wchar don’t help much because Unicode characters can be multiple code points. 👨‍👨‍👧‍👧👩‍👩‍👦‍👦👨‍👩‍👧‍👦👩‍👩‍👧‍👦 Commented Dec 3, 2021 at 19:23
  • wchar_t works just fine for Unicode, as long as you take into account that wchar_t is different sizes on different platforms (16 bits on Windows, 32 bits on others), so use std::wstring instead of a single wchar_t so you can account for the possibility of needing multiple wchar_ts to encode a single Unicode codepoint, and multiple codepints to encode a single Unicode grapheme. Commented Dec 3, 2021 at 19:37

1 Answer 1

4

A char is usually only 1 byte, meaning it won't be able to store most Unicode characters. You should look into using wchar_t which is required to be large enough to hold any supported character codepoint. The associated char literal looks as follows: L'█'.

Sign up to request clarification or add additional context in comments.

5 Comments

What about L'👨‍👨‍👦‍👦'?
👨‍👨‍👦‍👦 is not a single codepoints, but 8 codepoints making up what your browser likely shows as one emoticon, thus it can't be stored in a character type and must be stored in a string.
"wchar_t ... is required to be large enough to hold any supported character codepoint" - that is not the case on Windows, where wchar_t is only 16 bits, so it can't hold Unicode codepoints > U+FFFF, but it can hold UTF-16 code units, which is why Unicode wchar_t strings on Windows are encoded in UTF-16 (previously UCS-2), whereas other platforms can encode wchar_t strings using UTF-32 instead.
A char is usually only 1 byte char is always exactly 1 byte.
It depends on the definitions you use, as per the standard a byte is defined as an addressable unit of data storage large enough to hold any member of the basic character set of the execution environment and a char is defined as single-byte character <C> bit representation that fits in a byte. However, the common definition of a byte defines it as containing 8 bits, which is not necessarily equivalent to the definition in the standard.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.