73

In C++11 basic_string::c_str is defined to be exactly the same as basic_string::data, which is in turn defined to be exactly the same as *(begin() + n) and *(&*begin() + n) (when 0 <= n < size()).

I cannot find anything that requires the string to always have a null character at its end.

Does this mean that c_str() is no longer guaranteed to produce a null-terminated string?

4
  • 24
    surely such a drastic change would break lots of old code... Commented Sep 26, 2011 at 10:59
  • 3
    @Nim: I agree completely, but I was wondering where in the standard this requirement is stated. Commented Sep 26, 2011 at 11:05
  • 6
    If c_str didn't return a NULL terminated string, it would be the most misnamed function ever. Commented Oct 17, 2011 at 14:09
  • 2
    You missed an = in 0 <= n <= size() ... everything is fine when you include it, as the Standard does Commented Aug 28, 2015 at 21:08

4 Answers 4

81

Strings are now required to use null-terminated buffers internally. Look at the definition of operator[] (21.4.5):

Requires: pos <= size().

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

Looking back at c_str (21.4.7.1/1), we see that it is defined in terms of operator[]:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

And both c_str and data are required to be O(1), so the implementation is effectively forced to use null-terminated buffers.

Additionally, as David Rodríguez - dribeas points out in the comments, the return value requirement also means that you can use &operator[](0) as a synonym for c_str(), so the terminating null character must lie in the same buffer (since *(p + size()) must be equal to charT()); this also means that even if the terminator is initialised lazily, it's not possible to observe the buffer in the intermediate state.

Sign up to request clarification or add additional context in comments.

22 Comments

That doesn't say anything about the string being null-terminated.
While that does not say that the string must be null terminated, it can be inferred from the string requirements. Both c_str and data must be a O(1) operation, which means that they cannot create a copy on the fly. Additionally, the requirement of matching operator[] output means that either it is already nul terminated, or the call to data/c_str must add the nul terminator prior to returning the pointer. Additionally, the string must have space for that terminator before the call to maintain the O(1) requirement. Technically the string need not be nul terminated, but data() does
Also, the last quote: Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()]. Means that &operator[](size()) == &operator[](size()-1) + 1 --i.e. if operator[](size()) returned a reference to a \0 outside of the string, this requirement could never be met.
@jalf: That doesn't say anything about the string being null-terminated. Yes, it does. 21.4.7.1 says that the pointer returned by c_str() must point to a buffer of length size()+1. 21.4.5 says that the last element of this buffer must have a value of charT() -- in other words, the null character.
@jalf: "This answer only gives us half of the inference chain." It gives two thirds of the full chain. The one thing that is missing is that the value assigned by default initialization charT() is the null character. This is clearly the case when charT is char. The standard is a bit vague (more than a bit vague) on the meaning of wchar_t.
|
23

Well, in fact it is true that the new standard stipulates that .data() and .c_str() are now synonyms. However, it doesn't say that .c_str() is no longer zero-terminated :)

It just means that you can now rely on .data() being zero-terminated as well.

Paper N2668 defines c_str() and data() members of std::basic_string as follows:

 const charT* c_str() const; const charT* data() const; 

Returns: A pointer to the initial element of an array of length size() + 1 whose first size() elements equal the corresponding elements of the string controlled by *this and whose last element is a null character specified by charT().

Requires: The program shall not alter any of the values stored in the character array.

Note that this does NOT mean that any valid std::string can be treated as a C-string because std::string can contain embedded nulls, which will prematurely end the C-string when used directly as a const char*.

Addendum:

I don't have access to the actual published final spec of C++11 but it appears that indeed the wording was dropped somewhere in the revision history of the spec: e.g. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf

§ 21.4.7 basic_string string operations [string.ops]

§ 21.4.7.1 basic_string accessors [string.accessors]

 const charT* c_str() const noexcept; const charT* data() const noexcept; 
  1. Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
  2. Complexity: constant time.
  3. Requires: The program shall not alter any of the values stored in the character array.

8 Comments

@R.MartinhoFernandes: my edit and your comment must have crossed posts?
Yeah, sorry about that. Regarding your edit I'd like to note that the FDIS wording is very different from this and the requirement for null-termination is not this obvious, but it's ninja'ed in :)
dug up some more revisions. Now, who buys me that copy of the spec ;)
Please escape the Square brackets that appear as part of Operator[](i) in your post, since they are currently interpreted as a link, which makes the text impossible to understand.
@Kevin: sry about that, fixed
|
10

The "history" was that a long time ago when everyone worked in single threads, or at least the threads were workers with their own data, they designed a string class for C++ which made string handling easier than it had been before, and they overloaded operator+ to concatenate strings.

The issue was that users would do something like:

s = s1 + s2 + s3 + s4; 

and each concatenation would create a temporary which had to implement a string.

Therefore someone had the brainwave of "lazy evaluation" such that internally you could store some kind of "rope" with all the strings until someone wanted to read it as a C-string at which point you would change the internal representation to a contiguous buffer.

This solved the problem above but caused a load of other headaches, in particular in the multi-threaded world where one expected a .c_str() operation to be read-only / doesn't change anything and therefore no need to lock anything. Premature internal-locking in the class implementation just in case someone was doing it multi-threaded (when there wasn't even a threading standard) was also not a good idea. In fact it was more costly to do anything of this than simply copy the buffer each time. Same reason "copy on write" implementation was abandoned for string implementations.

Thus making .c_str() a truly immutable operation turned out to be the most sensible thing to do, however could one "rely" on it in a standard that now is thread-aware? Therefore the new standard decided to clearly state that you can, and thus the internal representation needs to hold the null terminator.

1 Comment

The old string also had the strange property that the first non const begin() would invalidate iterators!
2

Well spotted. This is certainly a defect in the recently adopted standard; I'm sure that there was no intent to break all of the code currently using c_str. I would suggest a defect report, or at least asking the question in comp.std.c++ (which will usually end up before the committee if it concerns a defect).

2 Comments

Well, there are bits in the FDIS that are arguably shaky. 21.4.2/2 says that .data() for an empty string isn't actually null-terminated (.data()+1 is not valid, but should be a pointer one beyond the \0)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.