53

In a 2008 post on his site, Herb Sutter states the following:

There is an active proposal to tighten this up further in C++0x and require null-termination and possibly ban copy-on-write implementations, for concurrency-related reasons. Here’s the paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html . I think that one or both of the proposals in this paper is likely to be adopted, but we’ll see at the next meeting or two.

I know that C++11 now guarantees that the std::string contents get stored contiguously, but did they adopt the above in the final draft?

Will it now be safe to use something like &str[0]?

1
  • Guaranty that contents stored contiguously is provided in C++03 as well? Commented Jul 22, 2013 at 15:35

3 Answers 3

58

Yes, per [string.accessors] p1, std::basic_string::c_str():

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

Complexity: constant time.

Requires: The program shall not alter any of the values stored in the character array.

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

Sign up to request clarification or add additional context in comments.

9 Comments

Note that the same requirement holds true for data, which I believe wasn't true for C++98/03.
Yes, it's illuminating that basic_string<>::c_str() and basic_string<>::data() now have exactly identical semantics.
This doesn't appear to answer the question with which the post is titled - ie "Will std::string always be null-terminated in C++11?", in which case the answer is no. operator[str.length()] will return '\0', but that doesn't mean that the string actually contains it in memory.
@AndrewMarshall: operator[] is required to return a reference to the actual stored element, so (21.4.7.1/1) also applies the requirement that the element at operator[str.length()] must be part of the storage.
@S.S.Anne No, in this case, the terminator is part of the sequence. Not that it is always part of the sequence, look e.g. .at().
|
0

&str[0] is safe to use -- so long as you do not assume it points to a null-terminated string.

Since C++11 the requirements include (section [string.accessors]):

  • str.data() and str.c_str() point to a null-terminated string.
  • &str[i] == str.data() + i , for 0 <= i <= str.size()
    • note that this implies the storage is contiguous.

However, there is no requirement that &str[0] + str.size() points to a null terminator.

A conforming implementation must place the null terminator contiguously in storage when data(), c_str() or operator[](str.size()) are called; but there is no requirement to place it in any other situation, such as calls to operator[] with other arguments.


To save you on reading the long chat discussion below: The objection was been raised that if c_str() were to write a null terminator, it would cause a data race under res.on.data.races#3 ; and I disagreed that it would be a data race .

18 Comments

The constexpr const CharT* data() const noexcept; overload can't modify anything, so it has to be there from the start
@Caleth The text you quote was added in C++20
@M.M it's been a const member function and had an O(1) requirement since at least C++11 if not longer. De-facto it had to be zero terminated internally. Edit: yes it was const prior
@Mgetz placing a null terminator is O(1) since the length is known. A const member function is allowed to modify mutable internal storage of an object; and any dynamically allocated storage that the object holds an internal pointer to
If it were allowed to modify the buffer, it would have to do it in a way where there was no possibility of a data race, which I don't think is possible without a per-string mutex or similar
|
-3

Although c_str() returns a null terminated version of the std::string, surprises may await when mixing C++ std::string with C char* strings.

Null characters may end up within a C++ std::string, which can lead to subtle bugs as C functions will see a shorter string.

Buggy code may overwrite the null terminator. This results in undefined behaviour. C functions would then read beyond the string buffer, potentially causing a crash.

#include <string> #include <iostream> #include <cstdio> #include <cstring> int main() { std::string embedded_null = "hello\n"; embedded_null += '\0'; embedded_null += "world\n"; // C string functions finish early at embedded \0 std::cout << "C++ size: " << embedded_null.size() << " value: " << embedded_null; printf("C strlen: %d value: %s\n", strlen(embedded_null.c_str()), embedded_null.c_str()); std::string missing_terminator(3, 'n'); missing_terminator[3] = 'a'; // BUG: Undefined behaviour // C string functions read beyond buffer and may crash std::cout << "C++ size: " << missing_terminator.size() << " value: " << missing_terminator << '\n'; printf("C strlen: %d value: %s\n", strlen(missing_terminator.c_str()), missing_terminator.c_str()); } 

Output:

$ c++ example.cpp $ ./a.out C++ size: 13 value: hello world C strlen: 6 value: hello C++ size: 3 value: nnn C strlen: 6 value: nnna� 

4 Comments

"missing_terminator[3] = 'a';" That's explicitly UB. You can read from the NUL terminator, but you cannot write to it. Well, you can't write any value other than NUL to it.
I wouldn't say "c_str() generally returns", since C++11 it "returns a pointer to a null-terminated character array with data equivalent to those stored in the string.".
Replacing the null-terminator with another character is UB. However is an embedded null allowed? Both lead to problems, neither is caught by GCC or Clang.
Yes they are allowed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.