Since std::string is actually a typedef of a templated class, how can I override it? I want to make a UTF-8 std::string that will return the correct length, among other things.
8 Answers
If you must define your own string type, then don't inherit from std::string but define your own Character Traits class and do something like
typedef std::basic_string<unsigned char, utf8_traits> utf8string; See also Herb Sutter's website.
3 Comments
std::string is a typedef, and this is it.basic_string to handle UTF-8 sequences. You'll more harm than good whenever you try to manipulate it.state_type, or pack them as UTF-32 and convert at the boundary. Either way, it's a lot of work, but it retains compatibility with STL algorithms.DON'T DERIVE FROM STRING
std::string, that is, basically the whole basic_string template is not designed to be derived from. There are zillions of articles about that already. It doesn't have any virtual functions so there is nothing to override. The best you can do is hide something. Best is to use composition/aggregation! That is, just keep a member of type string in your class and forward the calls! Again, just to make sure
DON'T DERIVE FROM STRING
4 Comments
std::basic_string. Hiding an std::string in this case raises the issue of char signedness, since UTF-8 is an 8-bit multibyte encoding.basic_string doesn't have virtual function, then the guideline should be : DONT USE IT POLYMORPHICALLY. There is nothing wrong in inheritance, if you have documented it properly!It is generally considered a mistake in C++ to derive from a standard library container. However, the functionality you are looking for has already been implemented. Have a look at Glib::ustring.
Hope this helps!
Comments
Just be sure you know what you are doing first. What is exactly the "correct length" you want to return from your string objects? Number of code points? That does not always correspond to the number of characters as perceived by the user.
Anyway, take a look at the utf8-cpp library to see an alternative approach to deriving from std::string.
Comments
Writing a unicode implementation that conforms and works properly in every circumstance is very difficult to do. I would advise you to use an existing library or implementation instead of rolling your own. For example, Windows, OSX and Qt all have libraries which support UTF-16 and other encoded strings.
Comments
As is has already been stated by others : don't derive from std::string, it's just not designed for this.
You should have a look on this article, which shows how to create a case insensitive string class as an example. You will see that the logic implemented in std::basic_string is independent of the character type, and that providing some custom char_traits should do the trick.