18

I’ve recently found out an odd thing. It seems that calculating Collatz sequence lengths with no caching at all is over 2 times faster than using std::unordered_map to cache all elements.

Note I did take hints from question Is gcc std::unordered_map implementation slow? If so - why? and I tried to used that knowledge to make std::unordered_map perform as well as I could (I used g++ 4.6, it did perform better than recent versions of g++, and I tried to specify a sound initial bucket count, I made it exactly equal to the maximum number of elements the map must hold).

In comparision, using std::vector to cache a few elements was almost 17 times faster than no caching at all and almost 40 times faster than using std::unordered_map.

Am I doing something wrong or is this container THAT slow and why? Can it be made performing faster? Or maybe hashmaps are inherently ineffective and should be avoided whenever possible in high-performance code?

The problematic benchmark is:

#include <iostream> #include <unordered_map> #include <cstdint> #include <ctime> std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) { static std::unordered_map <std::uint_fast64_t, std::uint_fast16_t> cache ({{1,1}}, 2168611); if(cache.count(val) == 0) { if(val%2 == 0) cache[val] = getCollatzLength(val/2) + 1; else cache[val] = getCollatzLength(3*val+1) + 1; } return cache[val]; } int main() { std::clock_t tStart = std::clock(); std::uint_fast16_t largest = 0; for(int i = 1; i <= 999999; ++i) { auto cmax = getCollatzLength(i); if(cmax > largest) largest = cmax; } std::cout << largest << '\n'; std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n'; } 

It outputs: Time taken: 0.761717

Whereas a benchmark with no caching at all:

#include <iostream> #include <unordered_map> #include <cstdint> #include <ctime> std::uint_fast16_t getCollatzLength(std::uint_fast64_t val) { std::uint_fast16_t length = 1; while(val != 1) { if(val%2 == 0) val /= 2; else val = 3*val + 1; ++length; } return length; } int main() { std::clock_t tStart = std::clock(); std::uint_fast16_t largest = 0; for(int i = 1; i <= 999999; ++i) { auto cmax = getCollatzLength(i); if(cmax > largest) largest = cmax; } std::cout << largest << '\n'; std::cout << "Time taken: " << (double)(std::clock() - tStart)/CLOCKS_PER_SEC << '\n'; } 

Outputs Time taken: 0.324586

12
  • @WhiZTiM Am I doing something wrong or is this container THAT slow and why? Commented Mar 3, 2017 at 20:57
  • 2
    unordered_map just is pretty slow for a lot of uses. Most of the time std::vector is the fastest container. Data locality > everything else. Commented Mar 3, 2017 at 21:00
  • @WhiZTiM Edited the question, better now? Commented Mar 3, 2017 at 21:00
  • 1
    If you're having complexity issues, I feel bad for you son, std::unordered_map returns in Big-O(N). (worst case) Commented Mar 3, 2017 at 21:02
  • Have you checked the actual values? The use of a uint_fast16_t cache value IMO is suspicious given that half of 999999 is greater than 65536. Commented Mar 3, 2017 at 21:02

1 Answer 1

36

The standard library's maps are, indeed, inherently slow. Google's Chandler Carruth explains this in his CppCon 2014 talk; in a nutshell:

  • std::map maintains its data's sort order, leading to a tree-based implementation; and trees are slow: non-contiguous with a lot of pointer chain chasing.
  • std::unoredered_map is not as-bad, but it's still cache-unfriendly because it uses linked lists as buckets.

This SO question mentioned some efficient hash map implementations - use one of those instead.

Sign up to request clarification or add additional context in comments.

1 Comment

The linked talk is really an excellent video to watch. That guy has a much more in-depth understanding of performance issues than the vast majority of people who think they know about performance issues.