What is the internal structure of std::wstring? Does it include the length? Is it null terminated? Both?
3 Answers
Does it include the length
Yes. It's required by the C++11 standard.
§ 21.4.4
size_type size() const noexcept;
1. Returns: A count of the number of char-like objects currently in the string.
2. Complexity: constant time.
Note however, that this is unaware of unicode.
Is it null terminated
Yes. It's also required by the C++11 standard that std::basic_string::c_str returns a valid pointer for the range of [0,size()] in which my_string[my_string.size()] will be valid, hence a null character.
§ 21.4.7.1
const charT* c_str() const noexcept;
const charT* data() const noexcept;
1. Returns: A pointerpsuch thatp + i == &operator[](i)for eachiin[0,size()].
2. Complexity: constant time.
3. Requires: The program shall not alter any of the values stored in the character array.
14 Comments
We don't know. It's completely up to the implementation. (At least up until C++03 - apparently C++11 requires the internal buffer to be 0-terminated.) You can have a look at the source code of the C++ standard library implementation if the one you are using is opensource.
Apart from that, I'd find it logical if it was NUL-terminated and it stored an explicit length as well. This is good because then it takes constant time to return the length and a valid C string:
size_t length() { return m_length; } const wchar_t *c_str() { return m_cstr; } If it didn't store an explicit length, then size() would have to count the characters up to the NUL in O(n), which is wasteful if you can avoid it.
If, however, the internal buffer wasn't NUL-terminated, but it only stored the length, then it would be tedious to create a proper NUL-terminated C string: the string would have to either reallocate its storage and append the 0 (and reallocation is an expensive operation), or it would have to copy the entire buffer over, which is again an O(n) operation.
(Warning: shameless self-promotion - in a C language project I am currently working on, I've taken exactly this approach to implement flexible string objects.)
1 Comment
basic_string (from which wstring is typedef) has no need for terminators.
Yes, it manages its own lengths.
If you need a null-terminated (aka C string) version of string/wstring, call c_str(). But it can contain a null character inside it, in which case pretty much every C function to handle C strings will fail to see the entire string.
3 Comments
.c_str() member function and knows why and when to use it. Also, I hope you know about the wide-string handling functions in the C standard library, such as wstrlen().