My situation: I've been working now for a couple of months on my own unique hash function, I've changed it many times and had two main versions but I won't bore anyone with the details of my work; at least not here. I'm here to ask a question that's specific and helpful to anyone attempting to design a cryptographic hash. (I hope)
Now I'm more than aware that simply being a normally-distrusted random function, does not make a cryptographically secure hash.
MY QUESTION however IS: If collisions are found LESS often than would be predicted by a random function, is this an alarm-bell that we may have some empirical weakness?
Allow me to elaborate:
In my tests, I found that any given set of 32 bits (from a set position in the hash) from an array of unrelated input (or related input but different output) hash functions were indeed seemingly random until I noticed that: In order to find a collision: one would have to go through about 89000 different attempts on average (although obviously the numbers swayed massively). Since the birthday bound is only 65536 (or ~77000), something seems amiss. Since my testing involved using batteries of different PRNGS to create the inputs aswell as non-random methodical ones and I ran these tests hundreds of millions of times, meaning I've made literally billions of hashes... I don't think it's my testing that is flawed or that my sample sizes are too small.
I can say with almost 100% confidence, that it is taking more attempts to find a collision than would be expected of a truly random (ideal) output.
Side note: I tried with 16 bit, 40 bit samples, etc and always came up with the same results: collisions were LESS frequent.
2nd Side note: When describing the hash as a PRNG or indeed a PRNG using the hash, it passes birthday-spacing tests on the dieharder suite just fine (I find this bit very confusing).
If anyone can explain this strange phenomenom (the passing on dieharder but failing on my own tests) then extra points but what I want to know is:
So given we do have a "Desired probability of random collision" in a given hash (and logically speaking any smaller word within that hash) then what happens when in a given word (and presumably the hash in general) collisions are found less often than expected? What info do we gain about the likely hood of pre-image attacks? Non ideal distribution or any other info in fact?
I know this is probably a hard question to answer but maybe not.
