1

For example:

s = “兰蔻面膜” 

In Python, its length is:

>>> len(“兰蔻面膜”) 4 

But in C++, len=12 as below:

cout<< s.length()<<endl; 12 

Why is that? I am simply checking the length of Chinese string in c++ IDE, and found its length is 12. The 's' has 4 characters.

10
  • Actually in c++, its length is 12. Commented Nov 20, 2021 at 5:15
  • 6
    I imagine number of symbols vs number of bytes required to encode the symbols Commented Nov 20, 2021 at 5:15
  • 2
    len("兰蔻面膜".encode("utf8")) is 12. Commented Nov 20, 2021 at 5:24
  • 1
    @yyyy try something like L"👨‍🌾", L"👨‍👩‍👦‍👦", L"👩🏻‍❤️‍💋‍👩🏿", L"🇪🇺", L"Å", L"각", L"நி", L"षि", L"👧🏻", L"❤️", L"é"... to see if any of them has length = 1 even when sizeof(wchar_t) == 4. UTF-32 doesn't mean fixed-length characters Commented Nov 20, 2021 at 7:49
  • 1
    @Scheff'sCat that's possible after normalization, but only a few characters have such equivalence, mostly Latin letters. Lots of characters are composed of multiple code points and you can't count the code units to get the string length Commented Nov 20, 2021 at 8:54

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.