1

When we are working with characters sequence encoding, we have different types of encoding standards like single-byte encoding standards (ASCII and Extended ASCII), multibyte encoding standards (Shift-JIS, Unicode 16 and ...) and also Unicode 32 standards which It is prominent in today programming and software development but in Visual studio environment, we have just multibyte encoding and Unicode.

My question is: How should I work with variable-length encoding standards in Visual Studio environment? Is it support these types of encoding standards? If yes, how can I use that in visual studio or even another environment for a learning objective? For example, how can we use UCS 2 or UTF16 encoding in the Microsoft Visual Studio environment for C++ development?

3
  • UCS-2 is long gone (with the passing of Windows-NT). MS-Windows is native UTF-16LE (note this is variable length) which is fully supported by Visual Studio and enable by default on new projects (and has been for a while). Commented Jun 13, 2019 at 19:51
  • @RichardCritten How can I use UTF-16LE in Visual Studio environment? i.e. how to declare an array string with this encoding rather than ascii? Commented Jun 13, 2019 at 20:02
  • 2
    std::wstring should work. It's defined as std::basic_string<wchar_t> and wchar_ts are used with the W versions of WinAPI. Commented Jun 13, 2019 at 20:09

1 Answer 1

2

Visual Studio C++, supports:


Strings:

  • string : A type that describes a specialization of the template class basic_string with elements of type char
  • u16string : A type that describes a specialization of the template class basic_string with elements of type char16_t.
  • u32string : A type that describes a specialization of the template class basic_string with elements of type char32_t.
  • wstring : A type that describes a specialization of the template class basic_string with elements of type wchar_t.

https://learn.microsoft.com/en-us/cpp/standard-library/string-typedefs?view=vs-2019


Charater literals

  • Ordinary character literals of type char, for example 'a'
  • UTF-8 character literals of type char, for example u8'a'
  • Wide-character literals of type wchar_t, for example L'a'
  • UTF-16 character literals of type char16_t, for example u'a'
  • UTF-32 character literals of type char32_t, for example U'a'

https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=vs-2019#character-literals


Encoding:

  • A character literal without a prefix is an ordinary character literal. The value of an ordinary character literal containing a single character, escape sequence, or universal character name that can be represented in the execution character set has a value equal to the numerical value of its encoding in the execution character set. An ordinary character literal that contains more than one character, escape sequence, or universal character name is a multicharacter literal. A multicharacter literal or an ordinary character literal that can't be represented in the execution character set is conditionally-supported, has type int, and its value is implementation-defined.

  • A character literal that begins with the L prefix is a wide-character literal. The value of a wide-character literal containing a single character, escape sequence, or universal character name has a value equal to the numerical value of its encoding in the execution wide-character set unless the character literal has no representation in the execution wide-character set, in which case the value is implementation-defined. The value of a wide-character literal containing multiple characters, escape sequences, or universal character names is implementation-defined.

  • A character literal that begins with the u8 prefix is a UTF-8 character literal. The value of a UTF-8 character literal containing a single character, escape sequence, or universal character name has a value equal to its ISO 10646 code point value if it can be represented by a single UTF-8 code unit (corresponding to the C0 Controls and Basic Latin Unicode block). If the value can't be represented by a single UTF-8 code unit, the program is ill-formed. A UTF-8 character literal containing more than one character, escape sequence, or universal character name is ill-formed.

  • A character literal that begins with the u prefix is a UTF-16 character literal. The value of a UTF-16 character literal containing a single character, escape sequence, or universal character name has a value equal to its ISO 10646 code point value if it can be represented by a single UTF-16 code unit (corresponding to the basic multi-lingual plane). If the value can't be represented by a single UTF-16 code unit, the program is ill-formed. A UTF-16 character literal containing more than one character, escape sequence, or universal character name is ill-formed.

  • A character literal that begins with the U prefix is a UTF-32 character literal. The value of a UTF-32 character literal containing a single character, escape sequence, or universal character name has a value equal to its ISO 10646 code point value. A UTF-8 character literal containing more than one character, escape sequence, or universal character name is ill-formed.

https://learn.microsoft.com/en-us/cpp/cpp/string-and-character-literals-cpp?view=vs-2019#encoding


#include <string> using namespace std::string_literals; // enables s-suffix for std::string literals int main() { // Character literals auto c0 = 'A'; // char auto c1 = u8'A'; // char auto c2 = L'A'; // wchar_t auto c3 = u'A'; // char16_t auto c4 = U'A'; // char32_t // String literals auto s0 = "hello"; // const char* auto s1 = u8"hello"; // const char*, encoded as UTF-8 auto s2 = L"hello"; // const wchar_t* auto s3 = u"hello"; // const char16_t*, encoded as UTF-16 auto s4 = U"hello"; // const char32_t*, encoded as UTF-32 // Raw string literals containing unescaped \ and " auto R0 = R"("Hello \ world")"; // const char* auto R1 = u8R"("Hello \ world")"; // const char*, encoded as UTF-8 auto R2 = LR"("Hello \ world")"; // const wchar_t* auto R3 = uR"("Hello \ world")"; // const char16_t*, encoded as UTF-16 auto R4 = UR"("Hello \ world")"; // const char32_t*, encoded as UTF-32 // Combining string literals with standard s-suffix auto S0 = "hello"s; // std::string auto S1 = u8"hello"s; // std::string auto S2 = L"hello"s; // std::wstring auto S3 = u"hello"s; // std::u16string auto S4 = U"hello"s; // std::u32string // Combining raw string literals with standard s-suffix auto S5 = R"("Hello \ world")"s; // std::string from a raw const char* auto S6 = u8R"("Hello \ world")"s; // std::string from a raw const char*, encoded as UTF-8 auto S7 = LR"("Hello \ world")"s; // std::wstring from a raw const wchar_t* auto S8 = uR"("Hello \ world")"s; // std::u16string from a raw const char16_t*, encoded as UTF-16 auto S9 = UR"("Hello \ world")"s; // std::u32string from a raw const char32_t*, encoded as UTF-32 } 
Sign up to request clarification or add additional context in comments.

Comments