`std::unordered_map` without duplicating key data

Question

I have a Person class, which has a name property (std::string).

I want to create a lookup-table, a std::unordered_map, so I can find a Person by their name. However, given a Person, I also want to be able to get their name.

This necessitates storing the name twice - once as the key of the map, and once inside the person object, as shown in my code below.

Since I have many Persons loaded into memory at once, I don't want the overhead of storing their names twice.

I've tried using references/pointers to the keys instead inside the Person class, but this creates problems as the map seems to reshuffle its data when it is modified, and the references become invalid.

I've also tried using std::unordered_set, but this means I need to construct a whole Person object every time I want to perform a lookup.

Is there any way for the key and value of a unordered map to share the same data?

#include <iostream> #include <unordered_map> class Person { private: const std::string _name; public: Person( const std::string& name ) : _name( name ) { } const std::string& get_name() const { return _name; } }; int main() { auto my_set = std::unordered_map<std::string, std::shared_ptr<Person>>(); my_set.insert( { "alice", std::shared_ptr<Person>( new Person( "alice" )) } ); my_set.insert( { "bob", std::shared_ptr<Person>( new Person( "bob" )) } ); my_set.insert( { "charlie", std::shared_ptr<Person>( new Person( "charlie" )) } ); std::cout << my_set.find( "bob" )->second->get_name() << std::endl; return 0; }

@cz - Why not use an unordered set with a custom hasher instead? This seems to be what you are after anyway. — StoryTeller - Unslander Monica
– StoryTeller - Unslander Monica, Commented Aug 24, 2017 at 13:16
Unless the person object is large, using unordered_set is probably your best bet here. It's space efficient, still O(1). Default constructing a few fields, including things like vectors and strings, is very cheap. — Nir Friedman
– Nir Friedman, Commented Aug 24, 2017 at 14:25

Slava · Accepted Answer · 2017-08-24 14:05:07Z

You can use Boost.Multi-index for this purpose. Though there is a learning curve for this library you will find it very usable very fast. So for your case:

namespace mpi = boost::multi_index; boost::multi_index_container< Person, mpi::indexed_by< mpi::hashed_unique< mpi::const_mem_fun< Person, const std::string &, &Person::get_name > > > > my_set;

Now you can use it as hashed set with a string key:

auto f = my_set.find( "bob" ); if( f != my_set.end() ) std::cout << f->get_name() << std::endl;

This may look like a bit overkill, but you will see full power of this library when you start to add more members to the class Person you will need to provide different index to access them by that member. Let's say you added a phone number which is also unique (method const std::string &get_phone() const):

boost::multi_index_container< Person, mpi::indexed_by< mpi::hashed_unique< mpi::const_mem_fun< Person, const std::string &, &Person::get_name >, mpi::hashed_unique< mpi::const_mem_fun< Person, const std::string &, &Person::get_phone >> > > my_set; // lookup by phone: const auto &idx = boost::get<1>( my_set ); auto f = idx.find( "1234567890" ); if( f != my_set.end() ) std::cout << f->get_name() << std::endl;

Note: you can change stored data as a shared pointer instead of storing by value of course, I just omitted that for example simplicity.

I'm honestly not sure if this will end up using less space than the simple unordered_map solution. A string is often 3-4 words in size, so that would imply that the overhead of multi_index is less than 4N or so where N is the number of entries.
@NirFriedman I doubt that OPs main concern is actual space, bigger problem of data duplication as that creates bugs in the program - in his example it is easy to store person with different name than key for example.
While it may seem improbable to you... you can't just flat out second guess questions for absolutely no reason. Direct quote: "Since I have many Persons loaded into memory at once, I don't want the overhead of storing their names twice.".
@NirFriedman you assumed that overhead in the question means memory overhead, but that is not mandatory true, is it? There are multiple overheads in this approach and memory only one of them.
The "data duplication" "bug overhead" is not proportional to the number of Persons being created, so to say OP was talking about some kind of other overhead in that quote is dubious at best. In the time you've spent arguing with me you could have written a trivial 10 line allocator that will show much memory is being used, and improved your answer considerably instead.

Jarod42 · Accepted Answer · 2017-08-24 15:36:08Z

With std::set, you might use transparent comparer (std::unordered_set doesn't seems to support that :/ ):

struct LessPerson { using is_transparent = void; // enable "transparent" comparer template <typename T1, typename T2> bool operator ()(const T1& t1, const T2& t2) const { // Compare only "name". return toString(t1) < toString(t2); } // trivial one const std::string& toString(const std::string& s) const { return s; } // the one why we create the class const std::string& toString(const Person& p) const { return p.get_name(); } // A tricky one to handle dereference of (smart) pointers. template <typename T, std::enable_if_t<std::is_same<Person, std::decay_t<decltype(*std::declval<T>())>>::value>* = nullptr> const std::string& toString(const T& p) const { return (*p).get_name(); } };

And then use it:

auto my_set = std::set<std::shared_ptr<Person>, LessPerson>(); my_set.insert( { std::make_shared<Person>("alice") } ); my_set.insert( { std::make_shared<Person>("bob") } ); my_set.insert( { std::make_shared<Person>("charlie") } ); auto it = my_set.find("bob"); // search using "bob" directly without creating a new Person

Demo

anatolyg · Accepted Answer · 2017-08-24 14:15:13Z

If your "persons" are never copied or moved, and their names are never copied or moved, you can use a pointer to string instead of string as your key. This requires using a custom hash and equal functors.

struct myhash { unsigned operator()(std::string* s) const { return std::hash<std::string>()(*s); } }; struct myequal { unsigned operator()(std::string* s1, std::string* s2) const { return *s1 == *s2; } }; ... auto my_set = std::unordered_map<std::string*, std::shared_ptr<Person>, myhash, myequal>();

This also complicates lookup a bit: you have to lookup a pointer to string.

std::string b = "bob"; std::cout << my_set.find(&b)->second->get_name() << std::endl;

Here it is impossible to have the string bob inline, because your code has to get a pointer to it.

The shortcoming you mention could probably be removed by simply using reference_wrapper<const std::string> as the key instead.
@Nir Friedman Nope. Second the call to .find with an rvalue would not be possible and first when copying object and reference, the reference_wrapper is simply duplicated and the object is copied, the reference in the wrapper still pointing to the original string.
@DrSvanHay You're right, since they delete that overload, but it would be trivial to write something yourself that has the correct behavior. I don't understand your second point at all, it has the same behavior on copy as a pointer which is what the answer uses.

NoSenseEtAl · Accepted Answer · 2017-08-24 21:31:12Z

If you are really struggling with memory you should use boost::flat_set. It has very low memory overhead, the only problem is that if you update your set of persons it has horrible performance. If you just create and never modify it performance is worse than unordered_ but not terrible.

In case you insist on using unordered_map I think you need to use unordered_multiset since I see no point in having your class using just one field to determine if 2 instances are equal. This is possible, but very ugly, you need to define your own hashing and equality functions.

Another simpler but more error prone solution is to use hash as a key like this:

#include <string> #include <iostream> #include <unordered_map> class Person { public: Person(const std::string& name, const int age) : name_(name), age_(age) {} public: const std::string& name() const { return name_; } int age() const { return age_; } private: std::string name_; int age_; }; int main() { Person p1("Joe", 11), p2("Jane", 22), p3("James", 33), p4("Joe", 44); std::unordered_multimap<size_t, Person> persons{ {std::hash<std::string>()(p1.name()), p1}, {std::hash<std::string>()(p2.name()), p2},{std::hash<std::string>()(p3.name()), p3}, {std::hash<std::string>()(p4.name()), p4} }; auto potential_joes = persons.equal_range(std::hash<std::string>()("Joe")); for (auto it = potential_joes.first; it != potential_joes.second; ++it) { if (it->second.name() == "Joe") { std::cout << it->second.name() << " is " << it->second.age() << " years old" << std::endl; } } }

I would use this only if your strings are long, you actually measured memory usage and you are uncomfortable with writing custom comparators. As you see from code you are reimplemening a lot of unordred_map logic yourself, and it is easy to mess up.

important note If your key depends on your value in the map you must be sure not to modify value. So for example in the code I posted you should probably make member name_ const and comment why it is const.

Collectives™ on Stack Overflow

`std::unordered_map` without duplicating key data

4 Answers 4

8 Comments

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

Comments

3 Comments

Comments

Linked

Related