31

I know that a trivial std::string_view is not guaranteed to be null-terminated. However, I don't know if a std::string_view literal is guaranteed to be null-terminated.

For example:

#include <string_view> using namespace std::literals; int main() { auto my_sv = "hello"sv; } 

Does C++17 or later guarantee that my_sv.data() is null-terminated?

=== Below is updated ===

All of below are from n4820:

  1. As per 5.13.5.14, a string literal is null-terminated.
  2. As per 5.13.8, a user-defined-string-literal is composed of a string literal plus a custom suffix. Say, "hello"sv, hello is the string literal, sv is the suffix.
  3. As per 5.13.8.5, "hello"sv is treated as a call of the form operator "" sv(str, len); as per 5.13.5.14, str is null-terminated.
  4. As per 21.4.2.1, sv's data() must return str.

Can they prove that "hello"sv.data() is guarantteed to be null-terminated by the C++ standard?

10
  • 2
    On this site use the green checkmark on an answer to indicate "Solved"; absence of such a checkmark indicates "Unsolved". You should not write "solved" etc. in the title Commented Jul 29, 2019 at 14:09
  • string_view is a class. Classes aren't null-terminated. It would improve the question to explain exactly what you are asking (perhaps give a code sample of usage of the string_view that demonstrates the case you are asking about) Commented Jul 29, 2019 at 14:11
  • @M.M Isn't it pretty clear what the OP is asking? Do you think that adding a few lines of code like those would be beneficial? Commented Jul 29, 2019 at 14:25
  • @M.M sv[sv.size()] is UB, sv.data()[sv.size()] isn't necessarily UB -- but should cause squeamishness. It's also not clear to me which one the OP is asking about. Commented Jul 29, 2019 at 14:38
  • @M.M So the question may be rephrased into something like as "given auto my_sv = "hello"sv;, does C++17 or later guarantees that my_sv.data() is null-terminated?" Commented Jul 29, 2019 at 14:41

1 Answer 1

40

So let's get the simple parts out of the way. No string_view is ever "NUL-terminated", in the sense that the object represents a sized range of characters. Even if you create a string_view from a NUL-terminated sequence of characters, the string_view itself is still not "NUL-terminated".

The question you're really asking is this: does the implementation have some leeway to make the statement "some literal"sv yield a string_view whose data member does not point into the NUL-terminated string literal represented by "some literal"? That is, is this:

string_view s = "some literal"sv; 

permitted to behave in any way differently from this:

const char *lit = "some literal"; string_view s(lit, <number of chars in of lit>); 

In the latter case, s.data() is guaranteed to be a pointer to the string literal, and thus you could treat that pointer as a pointer to a NUL-terminated string. You're asking if the former is just as valid.

Let's investigate. The definition for the operator""sv overloads are stated to be:

constexpr string_view operator""sv(const char* str, size_t len) noexcept; 

Returns: string_­view{str, len}.

That is the standard specification for the behavior of this function: it returns a string_view which points into the memory supplied by str. Therefore, the implementation cannot allocate some hidden memory and use that or whatever; the returned string_view::data is required to return the same pointer as str.

Now, this brings us to a different question: is str required to be a NUL-terminated string? That is, is it legal for a compiler to sees that you are using the sv UDL implementation and therefore remove the NUL character from the array it was going to create for the string literal passed as str?

Let's look at how UDLs for strings work:

If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character). The literal L is treated as a call of the form

operator "" X(str, len) 

Note the phrases I emphasized. We know the behavior of "the literal without its ud-suffix". And the second phrase makes specific mention of the expected NUL terminator for str. I'd say that's a pretty clear statement that str will be given a literal string. And that literal string will be built in accord with regular string literal rules in C++, and therefore will be NUL-terminated.

Given the above, I think it is safe to say that there is no wiggle room for the implementation here. The string_view returned by the UDL must point to the array defined by the string literal specified in the UDL, and like any other string literal, that array will be NUL-terminated.

That having been said, please review my first paragraph. You should not write any code which assumes that a string_view is NUL-terminated. I would call it a code smell even if the creator of the string_view and its consumer are right next to each other.

Sign up to request clarification or add additional context in comments.

5 Comments

Just speculation, but I suspect that this may not be as strict a requirement as you suggest. It would not surprise me if it is only possible to observe the null-terminator through some form of undefined behavior. If that is the case, I suspect the compilers would be free to drop the null-terminator because any non-UB code would behave the same result as if there were a null-terminator.
@SirNate: "It would not surprise me if it is only possible to observe the null-terminator through some form of undefined behavior." The NUL-terminator is part of the specification. Any string literal must have one, and it is very much observable. It's even part of the size of a string literal. When you create a non-empty string_view, you give it a pointer to some character array. All of the accessors of string_view return pointers or references to that array. If the array given was NUL-terminated, then the NUL-terminator is observable from those pointers/references.
The standard explicitly states that "data() can return a pointer to a buffer that is not null-terminated." and that "operator[](size()) has undefined behavior". For me, that's sufficient justification for a compiler writer to drop the null-terminator and claim they are compliant with the standard (I'm not sure I would agree with that, but compile writers & the standard do several things I disagree with already without caring about my opinion).
@SirNate: "For me, that's sufficient justification for a compiler writer to drop the null-terminator" The first thing you linked to is a notation, not a normative part of the standard. So it has precisely zero weight. What matters are "Constructs a basic_string_view, initializing data_" and "Returns: data_.". There is no way for the standard to change what data_ points to between the time it is given that pointer and when it gets returned.
Seems the meaning of "Note:" is very different from how I normally use it: stackoverflow.com/questions/21364398/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.