Implementing a String class with implicit conversion to char* (C++)

Question

It might not be advisable according to what I have read at a couple of places (and that's probably the reason std::string doesn't do it already), but in a controlled environment and with careful usage, I think it might be ok to write a string class which can be implicitly converted to a proper writable char buffer when needed by third party library methods (which take only char* as an argument), and still behave like a modern string having methods like Find(), Split(), SubString() etc. While I can try to implement the usual other string manipulation methods later, I first wanted to ask about the efficient and safe way to do this main task. Currently, we have to allocate a char array of roughly the maximum size of the char* output that is expected from the third party method, pass it there, then convert the return char* to a std::string to be able to use the convenient methods it allows, then again pass its (const char*) result to another method using string.c_str(). This is both lengthy and makes the code look a little messy.

Here is my very initial implementation so far:

MyString.h

#pragma once #include<string> using namespace std; class MyString { private: bool mBufferInitialized; size_t mAllocSize; string mString; char *mBuffer; public: MyString(size_t size); MyString(const char* cstr); MyString(); ~MyString(); operator char*() { return GetBuffer(); } operator const char*() { return GetAsConstChar(); } const char* GetAsConstChar() { InvalidateBuffer(); return mString.c_str(); } private: char* GetBuffer(); void InvalidateBuffer(); };

MyString.cpp

#include "MyString.h" MyString::MyString(size_t size) :mAllocSize(size) ,mBufferInitialized(false) ,mBuffer(nullptr) { mString.reserve(size); } MyString::MyString(const char * cstr) :MyString() { mString.assign(cstr); } MyString::MyString() :MyString((size_t)1024) { } MyString::~MyString() { if (mBufferInitialized) delete[] mBuffer; } char * MyString::GetBuffer() { if (!mBufferInitialized) { mBuffer = new char[mAllocSize]{ '\0' }; mBufferInitialized = true; } if (mString.length() > 0) memcpy(mBuffer, mString.c_str(), mString.length()); return mBuffer; } void MyString::InvalidateBuffer() { if (mBufferInitialized && mBuffer && strlen(mBuffer) > 0) { mString.assign(mBuffer); mBuffer[0] = '\0'; } }

Sample usage (main.cpp)

#include "MyString.h" #include <iostream> void testSetChars(char * name) { if (!name) return; //This length is not known to us, but the maximum //return length is known for each function. char str[] = "random random name"; strcpy_s(name, strlen(str) + 1, str); } int main(int, char*) { MyString cs("test initializer"); cout << cs.GetAsConstChar() << '\n'; testSetChars(cs); cout << cs.GetAsConstChar() << '\n'; getchar(); return 0; }

Now, I plan to call the InvalidateBuffer() in almost all the methods before doing anything else. Now some of my questions are :

Is there a better way to do it in terms of memory/performance and/or safety, especially in C++ 11 (apart from the usual move constructor/assignment operators which I plan to add to it soon)?
I had initially implemented the 'buffer' using a std::vector of chars, which was easier to implement and more C++ like, but was concerned about performance. So the GetBuffer() method would just return the beginning pointer of the resized vector of . Do you think there are any major pros/cons of using a vector instead of char* here?
I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?
Is it too much overkill rather than just doing the usual way of passing char array pointer, converting to a std::string and doing our work with it. The third party function calls expecting char* arguments are used heavily in the code and I plan to completely replace both char* and std::string with this new string if it works.

Thank you for your patience and help!

"It might not be advisable according to what I have read at a couple of places" And I'll write it again: This looks like a fairly bad idea. — Baum mit Augen
– Baum mit Augen ♦, Commented Aug 21, 2016 at 21:07
Why not just write var.c_str() when using with a third party libraries? It seems to me a lot of work that could be solved easily and safely just by calling the c_str() from std::string — Amadeus
– Amadeus, Commented Aug 21, 2016 at 21:10
Why not add a little Adaptor interface that will mimic all functions from the third party library, only taking no char pointer as an argument and returning std::string? It will be transparent for all existing code, without the hassle of a naked char pointer. — Jakub Zaverka
– Jakub Zaverka, Commented Aug 21, 2016 at 21:13
The std::string already allows you do pass a char*: stackoverflow.com/questions/38702943/… — Galik
– Galik, Commented Aug 21, 2016 at 21:17
The most important point is that it is completely unclear what char *p = someString; does when looking at the code. Is that \0 terminated? May I write to it? How much memory does it point to? It also allows nonsense code such as if(someString) and someString + 5; and so on, which tends to hide bugs. — Baum mit Augen
– Baum mit Augen ♦, Commented Aug 21, 2016 at 21:22

Daniel Jour · Accepted Answer · 2016-08-21 22:07:02Z

If I understood you correctly, you want this to work:

mystring foo; c_function(foo); // use the filled foo

with a c_function like ...

void c_function(char * dest) { strcpy(dest, "FOOOOO"); }

Instead, I propose this (ideone example):

template<std::size_t max> struct string_filler { char data[max+1]; std::string & destination; string_filler(std::string & d) : destination(d) { data[0] = '\0'; // paranoia } ~string_filler() { destination = data; } operator char *() { return data; } };

and using it like:

std::string foo; c_function(string_filler<80>{foo});

This way you provide a "normal" buffer to the C function with a maximum that you specify (which you should know either way ... otherwise calling the function would be unsafe). On destruction of the temporary (which, according to the standard, must happen after that expression with the function call) the string is copied (using std::string assignment operator) into a buffer managed by the std::string.

Addressing your questions:

Do you think there are any major pros/cons of using a vector instead of char* here?

Yes: Using a vector frees your from manual memory management. This is a huge pro.

I plan to add wide char support to it later. Do you think a union of two structs : {char,string} and {wchar_t, wstring} would be the way to go for that purpose (it will be only one of these two at a time)?

A union is a bad idea. How do you know which member is currently active? You need a flag outside of the union. Do you really want every string to carry that around? Instead look what the standard library is doing: It's using templates to provide this abstraction.

Is it too much overkill [..]

Writing a string class? Yes, way too much.

Your implementation is nice. I just have one question - why a template? The purpose here could be solved by a plain old struct as well taking two arguments (string, and max capacity) in its constructor right?
The template allows the buffer to be part of the structure, both having automatic storage duration. Thus the template avoids the heap allocation that would otherwise be necessary: passing in the size means that you need to allocate the buffer dynamically.
Aah. I see. Actually, my implementation is inspired from yours, but a little bit different. It completely removes the char allocation part (C++ 11 only). Do you see any problems with it?: struct string_filler { string& destination; string_filler(string& d, size_t capacity) : destination(d) { destination.resize(capacity); } ~string_filler() { destination.resize(strlen(destination.c_str())); } operator char *() { return &destination[0]; } };
Apologies for the bad formatting (indentation is lost). I think that's all I can do in comments here. The above code assumes that all the functions using it will return proper null terminated C strings, which is the case with our third party library.
Looks good to me. Though you could use c_str() in the operator char * implementation, too (to be consistent with the use in the destructor, so just a minor style issue).

Garf365 · Accepted Answer · 2016-08-22 09:11:02Z

What you want to do already exists. For example with this plain old C function:

/** * Write n characters into buffer. * n cann't be more than size * Return number of written characters */ ssize_t fillString(char * buffer, ssize_t size);

Since C++11:

std::string str; // Resize string to be sure to have memory str.resize(80); auto newSize = fillSrting(&str[0], str.size()); str.resize(newSize);

or without first resizing:

std::string str; if (!str.empty()) // To avoid UB { auto newSize = fillSrting(&str[0], str.size()); str.resize(newSize); }

But before C++11, std::string isn't guaranteed to be stored in a single chunk of contiguous memory. So you have to pass through a std::vector<char> before;

std::vector<char> v; // Resize string to be sure to have memor v.resize(80); ssize_t newSize = fillSrting(&v[0], v.size()); std::string str(v.begin(), v.begin() + newSize);

You can use it easily with something like Daniel's proposition

Hi Garf365, Yes, I just came to know about this change (contiguous memory) in C++ 11 from Galik . I have a confusion in your 'without first resizing' example though. Wouldn't it fail in cases where the fillString function tries to write more characters than what the string might have allocated by default?
The second argument of fillString is size of buffer. So fillString never fill more than size characters inside string (see edit). A lot of functions have this parameter. If your function haven't it, be sure that your string correctly sized, because, as you mentioned, if fillString try to write more than allocated memory inside string, you will got an UB. Also, without first resizing, you have to check if string as at least one element to avoid UB

Collectives™ on Stack Overflow

Implementing a String class with implicit conversion to char* (C++)

2 Answers 2

8 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

2 Comments

Linked

Related