12

Since std::string is actually a typedef of a templated class, how can I override it? I want to make a UTF-8 std::string that will return the correct length, among other things.

8 Answers 8

19

If you must define your own string type, then don't inherit from std::string but define your own Character Traits class and do something like

typedef std::basic_string<unsigned char, utf8_traits> utf8string; 

See also Herb Sutter's website.

Sign up to request clarification or add additional context in comments.

3 Comments

+1 - there is a reason std::string is a typedef, and this is it.
On the other hand, I would not use a basic_string to handle UTF-8 sequences. You'll more harm than good whenever you try to manipulate it.
@Matthieu M., that depends entirely on how you implement the Character Traits. I can see two options: either pack them as UTF-8 directly and implement the state_type, or pack them as UTF-32 and convert at the boundary. Either way, it's a lot of work, but it retains compatibility with STL algorithms.
16

DON'T DERIVE FROM STRING

std::string, that is, basically the whole basic_string template is not designed to be derived from. There are zillions of articles about that already. It doesn't have any virtual functions so there is nothing to override. The best you can do is hide something. Best is to use composition/aggregation! That is, just keep a member of type string in your class and forward the calls! Again, just to make sure

DON'T DERIVE FROM STRING

4 Comments

-1. Best practice for this task is to provide a Character Traits object and instantiate std::basic_string. Hiding an std::string in this case raises the issue of char signedness, since UTF-8 is an 8-bit multibyte encoding.
@larsmans: No objections :) I was just too preoccupied with the fact that someone somewhere had the thought to derive from a nonpolymorphic type :)
-1. Doesn't really answer the question. larsman's answer at least gives a good idea for a direction to go.
-1. I don't find this argument convincing. It presumes that the only classes which has virtual function can be derived from. If the basic_string doesn't have virtual function, then the guideline should be : DONT USE IT POLYMORPHICALLY. There is nothing wrong in inheritance, if you have documented it properly!
4

It is generally considered a mistake in C++ to derive from a standard library container. However, the functionality you are looking for has already been implemented. Have a look at Glib::ustring.

Hope this helps!

Comments

1
  1. Have you looked at ICU?

  2. A typedef is just a convenient label.

    class foo : public bar {} ;

works just fine when bar is a typedef of a PT.

It may not be a good idea in this case, but the language supports it.

Comments

1

Just be sure you know what you are doing first. What is exactly the "correct length" you want to return from your string objects? Number of code points? That does not always correspond to the number of characters as perceived by the user.

Anyway, take a look at the utf8-cpp library to see an alternative approach to deriving from std::string.

Comments

0

Better idea: create an STL-compatible utf8_string container without inheriting from std::string.

Comments

0

Writing a unicode implementation that conforms and works properly in every circumstance is very difficult to do. I would advise you to use an existing library or implementation instead of rolling your own. For example, Windows, OSX and Qt all have libraries which support UTF-16 and other encoded strings.

Comments

-1

As is has already been stated by others : don't derive from std::string, it's just not designed for this.

You should have a look on this article, which shows how to create a case insensitive string class as an example. You will see that the logic implemented in std::basic_string is independent of the character type, and that providing some custom char_traits should do the trick.

1 Comment

Actually I would not recommend doing this. I did this a few years ago and I had to regret that decision. What will happen is that you are going to have to convert back and forth between this new type and the standard string type, all over your code base. It's not pretty. A great article that explains in detail is found here: lafstern.org/matt/col2_new.pdf. Summary: Case insensitivity isn't about an object, it's about how you use an object.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.