1

I'm trying to understand what std::string::size() returns.

According to https://en.cppreference.com/w/cpp/string/basic_string/size it's the "number of CharT elements in the string", but I'm not sure how that relates to the number of printed characters, especially if string termination characters are involved somehow.

This code

int main() { std::string str0 = "foo" "\0" "bar"; cout << str0 << endl; cout << str0.size() << endl; std::string str1 = "foo0bar"; str1[3] = '\0'; cout << str1 << endl; cout << str1.size() << endl; return 0; } 

prints

foo 3 foobar 7 
  • In the case of str0, the size matches the number of printed characters. I assume the constructor iterates on the characters of the string literal until it reaches \0, which is why only 'f', 'o' and 'o' are put in the std::string, i.e. 3 characters, and the string termination character is not put in the std::string.
  • In the case of str1, the size doesn't match the number of printed characters. I assume the same went on as what I described above, but that I broke something by assigning a character. According to cppreference.com, "the behavior is undefined if this character is modified to any value other than CharT()", so I assume I've walked into undefined behavior here.

My question is this: outside of undefined behavior, is it possible that the size of a std::string doesn't match the number of printed characters, or is it actually something guaranteed by the standard?

(note: if the answer to that question changed between versions of the standard I'm interested in knowing that too)

5
  • According to https://en.cppreference.com/w/cpp/string/basic_string/operator_at, the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here. Why? '\0' is a valid CharT in your case. is it possible that the size of a std::string doesn't match the number of printed characters size returns the number of CharTs in your string, it has nothing to do with whether they are printable or not. Strings can contain binary data. Commented Nov 30, 2020 at 12:25
  • Note: std::string str0 = "foo" "\0" "bar"; is equivalent to std::string str0 = "foo"; Probably you wanted to construct it like this std::string str0 ("foo\0bar", 7);? Commented Nov 30, 2020 at 12:26
  • @AlgirdasPreidžius just std::string str0("foo\0bar"s) is enough. See How do you construct a std::string with an embedded null? Commented Nov 30, 2020 at 12:32
  • @phuclv That works too. I forgot about existence of string_literals namespace. Commented Nov 30, 2020 at 12:43
  • @tkausl The ref says CharT(), i.e. specifically the default value for CharT, and I assumed it was something other than \0 Commented Nov 30, 2020 at 13:44

2 Answers 2

3

In the case of str1 ... the behavior is undefined if this character is modified to any value other than CharT(), so I assume I've walked into undefined behavior here.

Your assumption is wrong. There is no UB for two reasons:

  • You did assign the element to '\0' which happens to be same as CharT() and thus it would be well defined to assign that value to str1[str1.size()].
  • Furthermore, str1.size() is 7 as you demonstrated and 3 is less than 7 and is therefore within bounds and it would be well defined to assign any value to that element.

is it possible that the size of a std::string doesn't match the number of printed characters

Yes, it is possible. std::string can contain non-printable characters as well, and thus the size is not necessarily the same as the number of printed characters. Your example str1 has no undefined behaviour and demonstrates how size can be different from number of printed characters.

Besides non-printable characters, in some character encodings - notably in unicode - grapheme clusters may consist of multiple graphemes which may consist of multiple code points which may consist of multiple code units (code unit is a single char object). The size of the string is the number of chars i.e. the number of code units. Thus, one should not expect the size of the string to match the number of printed characters.

or is it actually something guaranteed by the standard?

No such guarantee exists.

if the answer to that question changed between versions of the standard I'm interested in knowing that too

There has been no change regarding this.

Sign up to request clarification or add additional context in comments.

Comments

2

std::string has several constructors, one of which receives const char* and that's the one that constructs str0. Because there's no length information provided, the string will just be initialized until the null termination character is found

In case of str1 then the string length is really 7 characters. When you replace str1[3] with '\0' then the string doesn't change its length, but the content is now "foo\0bar". Unlike C string, std::string can contain embedded null because it has the length information. Therefore when you cout << str1 << endl; exactly 7 bytes are printed out. It's just that you don't see the byte '\0' in the output because it's ASCII NUL which isn't a printable character

It's recommended to use the s suffix to construct the std::string faster and with the ability to construct from a string with embedded null directly without resorting to another constructor. Try auto str0 = "foo\0bar"s; and see

2 Comments

Why " str1[3] = '\0\; " didn't work as replace but insert? Then, how to do char replacement? if one like to replace 'b' by '\0'.
@ytlu who said that? It replaces '0' with '\0'. If you want to replace the 'b' in the 5th position then use str1[4] = '\0'

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.