In a framework that makes use of the std::int* types (such as std::int16_t) as well as the std::int_fast* types (such as std::int_fast16_t) there could be general rules where one could be better than the other. I am going to refer to std::int* types as "exact" types and std::int_fast* types as "fast" types for simplicity of this post
As an example, it generally could be best to use exact types for array elements because it would be expected that they use less bytes and so there is more opportunity for caching improvement. As another example, when elements are read from that array they can be converted to the fast type for the purpose of doing arithmetic
void foo() { std::array<std::int16_t, N> arr {}; // ... std::int_fast16_t e = arr[index]; } What is less obvious is which type is more appropriate for key types of std::set, std::unordered_set , or the related map types. Would it make sense to assume the container's internal storage generally benefits more from compactness of exact types like the array case?
Edit: I added benchmarks for std::set and std::unordered_set each with std::uint16_t and std::uint_fast16_t. What the test does:
- Create a vector of N random numbers (once)
- In a loop of N times: clear the container being tested and then emplace a specific random number from the vector M times
N is the number of times to repeat the benchmark, the total time of benchmarks is then averaged. M is the number of emplaces the benchmark is measuring
------------------------------------------------------- std::set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 359.072 nsec | 344.232 nsec Emplace 100 | 3.455 usec | 3.298 usec Emplace 1000 | 34.285 usec | 32.525 usec Emplace 10000 | 341.679 usec | 334.176 usec Emplace 100000 | 3.432 msec | 3.229 msec ------------------------------------------------------- std::unordered_set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 405.566 nsec | 418.305 nsec Emplace 100 | 3.873 usec | 3.888 usec Emplace 1000 | 38.617 usec | 38.708 usec Emplace 10000 | 386.67 usec | 387.169 usec Emplace 100000 | 3.888 msec | 3.894 msec Edit again: I later realized using a specific random number was a silly thing to benchmark. Below is another benchmark where a series of M random numbers are emplaced (random numbers between 0 - 65,535):
------------------------------------------------------- std::set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 490.606 nsec | 524.057 nsec Emplace 100 | 5.696 usec | 5.631 usec Emplace 1000 | 97.678 usec | 95.026 usec Emplace 10000 | 1.616 msec | 1.617 msec Emplace 100000 | 8.353 msec | 8.494 msec ------------------------------------------------------- std::unordered_set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 543.433 nsec | 536.635 nsec Emplace 100 | 5.799 usec | 5.737 usec Emplace 1000 | 57.326 usec | 57.391 usec Emplace 10000 | 630.508 usec | 629.023 usec Emplace 100000 | 4.663 msec | 4.692 msec I won't post the benchmark code because my framework's benchmark code is non-trivial but it:
- Performs the work without measuring to "warm up"
- Measures with monotonic time
- Sandwiches benchmarks with
asm volatile("" : "+m"(container));to try to avoid reordering - Performs reads on the data after benchmarks to avoid optimizing out the work