109

If I want to construct a std::string with a line like:

std::string my_string("a\0b"); 

Where I want to have three characters in the resulting string (a, null, b), I only get one. What is the proper syntax?

1
  • 5
    You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string. See: stackoverflow.com/questions/10220401/… Commented Oct 14, 2012 at 16:27

12 Answers 12

159

Since C++14

we have been able to create literal std::string

#include <iostream> #include <string> int main() { using namespace std::string_literals; std::string s = "pl-\0-op"s; // <- Notice the "s" at the end // This is a std::string literal not // a C-String literal. std::cout << s << "\n"; } 

Before C++14

The problem is the std::string constructor that takes a const char* assumes the input is a C-string. C-strings are \0 terminated and thus parsing stops when it reaches the \0 character.

To compensate for this, you need to use the constructor that builds the string from a char array (not a C-String). This takes two parameters - a pointer to the array and a length:

std::string x("pq\0rs"); // Two characters because input assumed to be C-String std::string x("pq\0rs",5); // 5 Characters as the input is now a char array with 5 characters. 

Note: C++ std::string is NOT \0-terminated (as suggested in other posts). However, you can extract a pointer to an internal buffer that contains a C-String with the method c_str().

Also check out Doug T's answer below about using a vector<char>.

Also check out RiaD for a C++14 solution.

Sign up to request clarification or add additional context in comments.

2 Comments

update: as of c++11 strings are null-terminated. That being said, Loki's post remains valid.
@mna: They're null-terminated in terms of storage, but not in the sense that they are null-terminated with meaningful null termination (i.e. with string-length-defining semantics), which is the usual meaning of the term.
23

If you are doing manipulation like you would with a c-style string (array of chars) consider using

std::vector<char> 

You have more freedom to treat it like an array in the same manner you would treat a c-string. You can use copy() to copy into a string:

std::vector<char> vec(100) strncpy(&vec[0], "blah blah blah", 100); std::string vecAsStr( vec.begin(), vec.end()); 

and you can use it in many of the same places you can use c-strings

printf("%s" &vec[0]) vec[10] = '\0'; vec[11] = 'b'; 

Naturally, however, you suffer from the same problems as c-strings. You may forget your null terminal or write past the allocated space.

1 Comment

If you are say trying to encode bytes to string ( grpc bytes is stored as string) use the vector method as specified in the answer; not the usual way (see below) which will NOT construct the entire string byte *bytes = new byte[dataSize]; std::memcpy(bytes, image.data, dataSize * sizeof(byte)); std::string test(reinterpret_cast<char *>(bytes)); std::cout << "Encoded String length " << test.length() << std::endl;
13

I have no idea why you'd want to do such a thing, but try this:

std::string my_string("a\0b", 3); 

11 Comments

What are your concerns for doing this? Are you questioning the need to store "a\0b" ever? or questioning the use of a std::string for such storage? If the latter, what do you suggest as an alternative?
@Constantin then you're doing something wrong if you're storing binary data as a string. That's what vector<unsigned char> or unsigned char * were invented for.
I came across this while trying to learn more about security of strings. I wanted to test my code to make sure that it still works even if it reads a null character in while reading from a file / network what it expects to be textual data. I use std::string to indicate that the data should be considered as plain-text, but I am doing some hashing work and I want to make sure everything still works with null characters involved. That seems like a valid use of a string literal with an embedded null character.
@DuckMaestro No, that's not true. A \0 byte in a UTF-8 string can only be NUL. A multi-byte encoded character will never contain \0--nor any other ASCII character for that matter.
I came across this when trying to provoke an algorithm in a test case. So there are valid reasons; albeit few.
|
13

What new capabilities do user-defined literals add to C++? presents an elegant answer: Define

std::string operator "" _s(const char* str, size_t n) { return std::string(str, n); } 

then you can create your string this way:

std::string my_string("a\0b"_s); 

or even so:

auto my_string = "a\0b"_s; 

There's an "old style" way:

#define S(s) s, sizeof s - 1 // trailing NUL does not belong to the string 

then you can define

std::string my_string(S("a\0b")); 

Comments

8

The following will work...

std::string s; s.push_back('a'); s.push_back('\0'); s.push_back('b'); 

Comments

6

You'll have to be careful with this. If you replace 'b' with any numeric character, you will silently create the wrong string using most methods. See: Rules for C++ string literals escape character.

For example, I dropped this innocent looking snippet in the middle of a program

// Create '\0' followed by '0' 40 times ;) std::string str("\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", 80); std::cerr << "Entering loop.\n"; for (char & c : str) { std::cerr << c; // 'Q' is way cooler than '\0' or '0' c = 'Q'; } std::cerr << "\n"; for (char & c : str) { std::cerr << c; } std::cerr << "\n"; 

Here is what this program output for me:

Entering loop. Entering loop. vector::_M_emplace_ba QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ 

That was my first print statement twice, several non-printing characters, followed by a newline, followed by something in internal memory, which I just overwrote (and then printed, showing that it has been overwritten). Worst of all, even compiling this with thorough and verbose gcc warnings gave me no indication of something being wrong, and running the program through valgrind didn't complain about any improper memory access patterns. In other words, it's completely undetectable by modern tools.

You can get this same problem with the much simpler std::string("0", 100);, but the example above is a little trickier, and thus harder to see what's wrong.

Fortunately, C++11 gives us a good solution to the problem using initializer list syntax. This saves you from having to specify the number of characters (which, as I showed above, you can do incorrectly), and avoids combining escaped numbers. std::string str({'a', '\0', 'b'}) is safe for any string content, unlike versions that take an array of char and a size.

1 Comment

As part of my preparation for this post, I submitted a bug report to gcc in hopes that they will add a warning to make this a little safer: gcc.gnu.org/bugzilla/show_bug.cgi?id=54924
6

In C++14 you now may use literals

using namespace std::literals::string_literals; std::string s = "a\0b"s; std::cout << s.size(); // 3 

1 Comment

and the 2nd line can alternatively be written, more nicely imho, as auto s{"a\0b"s};
1

anonym's answer is excellent, but there's a non-macro solution in C++98 as well:

template <size_t N> std::string RawString(const char (&ch)[N]) { return std::string(ch, N-1); // Again, exclude trailing `null` } 

With this function, RawString(/* literal */) will produce the same string as S(/* literal */):

std::string my_string_t(RawString("a\0b")); std::string my_string_m(S("a\0b")); std::cout << "Using template: " << my_string_t << std::endl; std::cout << "Using macro: " << my_string_m << std::endl; 

Additionally, there's an issue with the macro: the expression is not actually a std::string as written, and therefore can't be used e.g. for simple assignment-initialization:

std::string s = S("a\0b"); // ERROR! 

...so it might be preferable to use:

#define std::string(s, sizeof s - 1) 

Obviously you should only use one or the other solution in your project and call it whatever you think is appropriate.

Comments

1

Better to use std::vector<char> if this question isn't just for educational purposes.

Comments

0

Another method from C++17 is to construct from std::string_view with the sv suffix:

using namespace std::literals; // or using namespace std::literals::string_view_literals; auto sv = "a\0b"sv; auto s = std::string{sv}; std::cout << s.size(); // 3 

It may be more useful if you need the view to use later, otherwise just construct the string directly with the ""s suffix

Comments

-5

I know it is a long time this question has been asked. But for anyone who is having a similar problem might be interested in the following code.

CComBSTR(20,"mystring1\0mystring2\0") 

1 Comment

This answer is too specific to Microsoft platforms and doesn't address the original question (which asked about std::string).
-8

Almost all implementations of std::strings are null-terminated, so you probably shouldn't do this. Note that "a\0b" is actually four characters long because of the automatic null terminator (a, null, b, null). If you really want to do this and break std::string's contract, you can do:

std::string s("aab"); s.at(1) = '\0'; 

but if you do, all your friends will laugh at you, you will never find true happiness.

5 Comments

std::string is NOT required to be NULL terminated.
It's not required to, but in almost all implementations, it is, probably because of the need for the c_str() accessor to provide you with the null terminated equivalent.
For effeciency a null character may be kept on the back of the data buffer. But none of the operations (ie methods) on a string use this knowledge or are affected by a string containing a NULL character. The NULL character will be manipulated in exactly the same way as any other character.
This is why it's so funny that string is std:: - its behaviour is not defined on ANY platform.
I wish user595447 was still here so that I could ask them what on Earth they thought they were talking about.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.