7

I need to calculate the hash of a large string in windows and linux, and the result should be the same for both OS.

For a simple test code, I get different hashes for windows and linux using std::hash. This makes sense, since the actual implementation of std::hash for each compiler might use different algorithms.

Which brings the question: Is there a way to achieve this using the standard library?

The more straight forward answer for me is to implement my own hash algorithm, so its the same for both OS. But this seems like an overkill. I don't want to reinvent the wheel.

5
  • Does it need to be the same output size/speed as std::hash? Or could you just use a third-party MD5 or SHA-# library? Commented Mar 23, 2021 at 13:53
  • std::hash really is not a versatile hashing algo. It's purpose in life is to provide hashing machinery to unordered containers with life scope limited to program runs (and accessible from the same program). Because of that, it has no guarantees of compatibility between different compilers (or, in fact, between different runs of the same program!). You have no other option but to write your own hashing algo. Commented Mar 23, 2021 at 14:03
  • 1
    Can you assume the character encoding is the same? And newlines? Commented Mar 23, 2021 at 14:07
  • @MSalters there will be newlines, and there could be different encondings Commented Mar 23, 2021 at 14:14
  • 3
    With std::hash you can't even rely on the hash value being the same between different runs on the same platform. The Standard only guarantees that std::hash returns the same value for the current invocation of the program. This is to allow implementers of std::hash to use a salted-hash. "...Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision denial-of-service attacks...." en.cppreference.com/w/cpp/utility/hash Commented Mar 23, 2021 at 14:18

3 Answers 3

4

In standard library hash algorithm is not fixed, may vary on different platforms/compilers.

But you can use very short and fast FNV1a algorithm for hashing, function with few lines of code, see below. You can read about it here.

It will give same result on all machines. But you have to fix set of params, 32-bit or 64-bit (32-bit params are commented out in my code).

Try it online!

#include <iostream> #include <string> #include <cstdint> inline uint64_t fnv1a(std::string const & text) { // 32 bit params // uint32_t constexpr fnv_prime = 16777619U; // uint32_t constexpr fnv_offset_basis = 2166136261U; // 64 bit params uint64_t constexpr fnv_prime = 1099511628211ULL; uint64_t constexpr fnv_offset_basis = 14695981039346656037ULL; uint64_t hash = fnv_offset_basis; for(auto c: text) { hash ^= c; hash *= fnv_prime; } return hash; } int main() { std::cout << fnv1a("Hello, World!") << std::endl; } 

Output:

7993990320990026836 
Sign up to request clarification or add additional context in comments.

4 Comments

Well, the same hash assuming the same character encoding. Hashing "é" still has opportunities for surprises.
@MSalters Actually I supposed OP has just bytes, so maybe in my code it is better to use std::vector<uint8_t> instead of std::string for clarity. But if std::string contents is totally same on all platforms, bit-identical so then my algo will give same result.
the size of std::size_t is system dependant. To obtain the same result on different systems I had to use std::uint64_t.
@cauchy Yes, nice comment, you have to use uint64_t. The only reason I placed size_t there is because std::hash also returns size_t, and I wanted to be std::hash-compliant. So in this sense std::hash is always different on 32 and 64 bit machines. But if it doesn't matter for you to have same return as std::hash then of cause you have to use uint64_t. You may have noticed that I have commented 32-bit lines. If uint64_t is too much for you you may use params of 32-bit and use uint32_t same way. Up to you.
1

Is there a way to achieve this using the standard library?

No, not only using the standard library since the hash algorithm used is not standardized.

I don't want to reinvent the wheel.

Unfortunately, you would have to.


You could however try to get rid of the requirement that they should be the same. If you can't compare the data after a hash hit, you wouldn't be sure that you got a true hit anyway.

Comments

0

You should think about whether this requirement is useful. I’m told that php now has a hashing implementation that intentionally gives different results per program run, and Swift definitely has done the exact same thing. You shouldn’t store or transmit hash codes, and you shouldn’t expect them to be the same when you run a program twice.

2 Comments

I see no reason for such a broad sweping statement without understanding OP's issue.
The C++ Standard allows implementers of std::hash to choose to do exactly as you describe. See my comment to OP.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.