Better key candidate, std::int16_t or std::int_fast16_t?

Question

In a framework that makes use of the std::int* types (such as std::int16_t) as well as the std::int_fast* types (such as std::int_fast16_t) there could be general rules where one could be better than the other. I am going to refer to std::int* types as "exact" types and std::int_fast* types as "fast" types for simplicity of this post

As an example, it generally could be best to use exact types for array elements because it would be expected that they use less bytes and so there is more opportunity for caching improvement. As another example, when elements are read from that array they can be converted to the fast type for the purpose of doing arithmetic

void foo() { std::array<std::int16_t, N> arr {}; // ... std::int_fast16_t e = arr[index]; }

What is less obvious is which type is more appropriate for key types of std::set, std::unordered_set , or the related map types. Would it make sense to assume the container's internal storage generally benefits more from compactness of exact types like the array case?

Edit: I added benchmarks for std::set and std::unordered_set each with std::uint16_t and std::uint_fast16_t. What the test does:

Create a vector of N random numbers (once)
In a loop of N times: clear the container being tested and then emplace a specific random number from the vector M times

N is the number of times to repeat the benchmark, the total time of benchmarks is then averaged. M is the number of emplaces the benchmark is measuring

------------------------------------------------------- std::set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 359.072 nsec | 344.232 nsec Emplace 100 | 3.455 usec | 3.298 usec Emplace 1000 | 34.285 usec | 32.525 usec Emplace 10000 | 341.679 usec | 334.176 usec Emplace 100000 | 3.432 msec | 3.229 msec ------------------------------------------------------- std::unordered_set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 405.566 nsec | 418.305 nsec Emplace 100 | 3.873 usec | 3.888 usec Emplace 1000 | 38.617 usec | 38.708 usec Emplace 10000 | 386.67 usec | 387.169 usec Emplace 100000 | 3.888 msec | 3.894 msec

Edit again: I later realized using a specific random number was a silly thing to benchmark. Below is another benchmark where a series of M random numbers are emplaced (random numbers between 0 - 65,535):

------------------------------------------------------- std::set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 490.606 nsec | 524.057 nsec Emplace 100 | 5.696 usec | 5.631 usec Emplace 1000 | 97.678 usec | 95.026 usec Emplace 10000 | 1.616 msec | 1.617 msec Emplace 100000 | 8.353 msec | 8.494 msec ------------------------------------------------------- std::unordered_set | std::uint16_t | std::uint_fast16_t -------------------+---------------+------------------- Emplace 10 | 543.433 nsec | 536.635 nsec Emplace 100 | 5.799 usec | 5.737 usec Emplace 1000 | 57.326 usec | 57.391 usec Emplace 10000 | 630.508 usec | 629.023 usec Emplace 100000 | 4.663 msec | 4.692 msec

I won't post the benchmark code because my framework's benchmark code is non-trivial but it:

Performs the work without measuring to "warm up"
Measures with monotonic time
Sandwiches benchmarks with asm volatile("" : "+m"(container)); to try to avoid reordering
Performs reads on the data after benchmarks to avoid optimizing out the work

Answer 1 · 2025-11-25 19:52:02Z

Setup a benchmark and measure. There is no generally applicable rule when it comes to such microoptimizations.

Though first I'd check if the types are actually different. Could be that intX_t is the same type as int_fastX_t

Answer 2 · 2025-11-25 19:57:27Z

I am aware that I can benchmark my current environment, that doesn't really cover the general rule question. I cannot benchmark future environments and so would just resort to using a general rule for development. On my current environment std::int16_t is short int and std::int_fast16_t is long int

Answer 3 · 2025-11-25 20:09:13Z

The general rule is if you care enough, you benchmark.

Answer 4 · 2025-11-25 20:19:11Z

Have you measured? If not, you just don't know.

Answer 5 · 2025-11-25 21:03:00Z

`_fast_` types in general

For silly, historical reasons, all _fastX_ types above int_fast8_t are 64 bit on Linux 64 bit (they are all simply long). This is not, in fact, fast. See int_fast8_t size vs int_fast16_t size on x86-64 platform

It can easily cost you vectorization opportunities compared to smaller types along with other performance reductions such as increased memory usage, slower division and modulo operations, etc. On x86-64 specifically, it also bloats the code with unnecessary REX instruction prefixes. I don't recommend using them.

Use as set keys

For set and unordered_set keys, the choice is not really important. The reason is that most time is spend on two things:

Memory lookups. In theory smaller types may benefit from better locality of data. However, the memory usage is dominated by the many pointers and poor locality of data of those data structures. A flat hash table can benefit. std::unordered_set won't. Remember that entries are heap-allocated together with some pointers for the data structure. Using a smaller type simply adds padding to get the proper alignment. Something like map<int, int> will save memory compared to map<int_fast16_t, int_fast16_t> since now two int can be packed into the place of a single long instead of wasting two long. A simple set<int> won't
For hash tables a modulo of the key by the number of buckets is computed. GCC uses prime numbers for its bucket count and the data type is size_t. If the key is of a smaller type, it will be converted to size_t before the instruction. Loading a smaller type with zero-extension does not cost anything in comparison to the division. Visual Studio uses power-of-two bucket counts which saves the modulo cost, replacing it with a simple bit mask. A cheaper operation but the smaller key size still doesn't matter

General use

For any other type of use, here are the general rules that I follow:

For scalar arithmetic operations, prefer simple int or unsigned. By design they are already intended to be the fast, natural type of the platform. Exception: Code targeting 8 bit microcontrollers should prefer 8 bit types. int has to be 16 bit or larger.
For array indices, e.g. loop counters, use ptrdiff_t or size_t. This avoids sign- or zero-extensions when computing memory addresses. Since long is a useless and misleading type cross-platform (always 32 bit on Windows), I usually declare using long_t = std::ptrdiff_t and using ulong_t = std::size_t in my code
For uniform arrays, prefer the smallest possible type, e.g. the _leastX_ category, as those can benefit from vectorization and memory bandwidth savings.
For single objects, use the natural type but consider smaller types for better packing. Convert to int or long_t when loading into local variables.
Tradeoffs:
- Pro natural type: On x86 in particular, it may result in more compact code when a normal memory load or store can be combined with an arithmetic instruction. A smaller type has to be loaded with sign or zero extension. That's only relevant if the value is not reused within the function
- Pro smaller type: Higher cache-efficiency.

When in doubt you have to benchmark or at least inspect the assembly. This is particularly important when doing computations where you expect vectorization. I said above that arithmetic should prefer int but in combination with arrays of compact types, it is not always obvious whether using the compact type for arithmetic operations results in better code. Usually the compiler is good at figuring this stuff out, sometimes it isn't.

There are also instances where a particular vector instruction is only available in certain widths. 16 bit in particular sometimes suffers from poor support. Then it's a bit of a toss-up whether the compiler can efficiently convert to 32 bit, do the operation, and convert back to 16 bit for storage, compared to keeping the whole thing in 32 bit.

Answer 6 · 2025-11-25 22:54:26Z

Just as a general remark, since you've mentioned "future environments" and "general rules": the "exact" types such as std::int16_t are not even guaranteed to exist on all systems. For example, it's very possible that some systems don't have integers smaller than 64-bit (and perhaps this will be even doubly probable in the future), in which case your code using std::int16_t will not compile at all. So, to maximize code portability, don't use "exact" types. But to maximize speed, you can sacrifice portability and optimize for some specific platform, known to have specific "exact" types, of course.

Answer 7 · 2025-11-26 11:59:03Z

I don't know how to properly use this new questions types. Just some thoughts...

Your edit renders some comments irrelevant. Perhaps rather than editing new information into the original question it would have been better to post them as reply. Imho that would be "conversational manner" while editing the quesiton caused the whole thread to be less easy to follow. Suppose someone new reads this. Question presents benchmark results, first comments ask to perform a benchmark. I'd probably stop right there and move on to the next tab, also because I don't know how to see the edit history of the question (that would be needed to make sense of the thread).

Anyhow, all threads I read so far run out of steam after a couple of comments have been posted. I have the feeling, the whole format isnt well designed for actual "conversational" exchange of ideas.

Answer 8 · 2025-11-26 15:37:20Z

@463035818_is_not_an_ai I don't know how to properly use this either but I decided that not having provided a benchmark was too much of an issue because people would just upvote that a benchmark was needed and ignore the actual question. Frankly I prefer using the more traditional format where questions and answers are improved as needed

@Homer512 Provided what I consider to be an excellent answer, I think the question and his answer are really all that are needed for other readers

Answer 9 · 2025-11-27 11:26:55Z

Frankly I prefer using the more traditional format where questions and answers are improved as needed

@Homer512 Provided what I consider to be an excellent answer, I think the question and his answer are really all that are needed for other readers

There is still the possibility to post a question in a traditional Q&A format. Thats what you want when you care only about the question and an answer that you can accept.

Collectives™ on Stack Overflow

Better key candidate, std::int16_t or std::int_fast16_t?

9 Replies 9

`_fast_` types in general

Use as set keys

General use

Collectives™ on Stack Overflow

Better key candidate, std::int16_t or std::int_fast16_t?

9 Replies 9

_fast_ types in general

Use as set keys

General use

`_fast_` types in general