12

My code is like this:

string s = "abc"; char* pc = const_cast<char*>( s.c_str() ); pc[ 1 ] = 'x'; cout << s << endl; 

When I compiled the snippet above using GCC, I got the result "axc" as expected. My question is, is that safe and portable to modify the underlying char array of a C++ string in this way? Or there might be alternative approaches to manipulate string's data directly?

FYI, my intention is to write some pure C functions that could be called both by C and C++, therefore, they can only accept char* as arguments. From char* to string, I know there is copying involved, the penalty is unfavorable. So, could anybody give some suggestions to deal with this sort of situation.

7 Answers 7

6

To the first part, c_str() returns const char* and it means what it says. All the const_cast achieves in this case is that your undefined behavior compiles.

To the second part, in C++0x std::string is guaranteed to have contiguous storage, just like std::vector in C++03. Therefore you could use &s[0] to get a char* to pass to your functions, as long as the string isn't empty. In practice, all string implementations currently in active development already have contiguous storage: there was a straw poll at a standard committee meeting and nobody offered a counter-example. So you can use this feature now if you like.

However, std::string uses a fundamentally different string format from C-style strings, namely it's data+length rather than nul-terminated. If you modify the string data from your C functions, then you can't change the length of the string and you can't be sure there's a nul byte at the end without c_str(). And std::string can contain embedded nuls which are part of the data, so even if you did find a nul, without knowing the length you still don't know that you've found the end of the string. You're very limited what you can do in functions that will operate correctly on both different kinds of data.

Sign up to request clarification or add additional context in comments.

Comments

5

(a) This is not necessarily the underlying string. std::string::c_str() should be a copy of the underlying string (though a bug in the C++ Standard means that, actually, it's often not... I believe that this is fixed in C++0x).

(b) const_casting away the constness only hacks the variable type: the actual object is still const, and your modifying it is Undefined Behaviour — very bad.

Simply speaking, do not do this.


Can you use &myString[0] at all? It has a non-const version; then again, it's stated to be the same as data()[0] which has no non-const version. Someone with a decent library reference to hand can clear this up.

6 Comments

So, is &mystring[0] the safe way?
@Need4Steed: Sort of. In C++98/C++03, the string contents aren't technically guaranteed to be contiguous... however, a bug in the standard means that all mainstream implementations do make it contiguous anyway, and this was made standard in C++0x. (Be aware that the pointer you get does not point to a null-terminated char array, so you'll have to pass the length around too.)
Yes, with newest standards. And there are no known implementations where it isn't. Beware to not overrun the reserved length though.
& @Coder: Thanks a lot! This is exactly what i want to know.
[string.require] 21.4.1.5 "The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size()."
|
4

The obvious answer is no, it's undefined behavior. On the other hand, if you do:

char* pc = &s[0]; 

you can access the underlying data, in practice today, and guaranteed in C++11.

Comments

3

As others said, it is not portable. But there are more dangers. Some std::string implementations (I know that GCC does it) use COW (copy on write).

#include <iostream> #include <string> int main() { std::string x("abc"); std::string y; y = x; // x and y share the same buffer std::cout << (void*)&x[0] << '\n'; std::cout << (void*)&y[0] << '\n'; x[0] = 'A'; // COW triggered // x and y no longer share the same buffer std::cout << (void*)&x[0] << '\n'; std::cout << (void*)&y[0] << '\n'; return 0; } 

2 Comments

Not all std::strings use copy-on-write semantics. Some implementations deep-copy the underlying character array when you copy a std::string. In any case, one shouldn't rely on an implementation details like this.
I would expect the first &x[0] to un-share the buffer, because it cannot tell if I store the pointer and use it later char* p = &x[0]; ...; *p = 'X'; What is y[0] now?
1

This is relying on undefined behaviour, and is therefore not portable.

Comments

1

This would depend on your operating system. In GNU libc library, std::string is implemented using a copy-on-write (CoW) pattern. Thus, if multiple std::string objects initially contain the same content, they will internally all point to the same data. Thus, if you modify any of them in the method you show in your question, the content of all of the (seemingly) unrelated std::string objects will change.

On Windows, I think the implementation doesn't use CoW, I'm not sure what would happen there.

Anyway, it's undefined behavior, so I'd stay clear of it. Chances are, even if you get it working, you'll eventually start running into very hard-to-trace bugs.

Comments

0

You should not mess with the underlying string. At the end of the day, string is an object, would you mess with any other objects this way?

Have you profiled your code to see if there is a penalty.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.