I lean heavily on interned strings as Basile suggests, where a string lookup translates to a 32-bit index to store and compare. That's useful in my case since I sometimes have hundreds of thousands to millions of components with a property named "x", e.g., which still needs to be a user-friendly string name since it's often accessed by scripters by name.
I use a trie for the lookup (experimented also with unordered_map but my tuned trie backed by memory pools at least started out performing better and was also easier to make thread-safe without just locking every time the structure was accessed) but it isn't as fast for construction as creating std::string. The point is more to speed up the subsequent operations like checking for string equality which, in my case, just boils down to checking two integers for equality and to drastically reduce memory usage.
I guess one option would be to maintain some kind of a registry of already allocated values but is it even possible to make the registry lookups faster than redundant memory allocations?
That's going to be tough to make a search through a data structure much faster than a single malloc, e.g. If you have a case where you are reading a boatload of strings from an external input like a file, for example, then my temptation would be to use a sequential allocator if possible. That comes with the downside that you can't free memory of an individual string. All the memory pooled by the allocator has to be freed at once or not at all. But a sequential allocator can be handy in cases where you just need to allocate a boatload of tiny variable-sized chunks of memory in a straight sequential fashion, only to then toss it all away later. I don't know if that applies in your case or not, but when applicable, it can be an easy way to fix a hotspot related to frequent teeny memory allocations (which might have more to do with cache misses and page faults than the underlying algorithm used by, say, malloc).