Since all hashing functions have a finite (but quite large) number of hashes. So let's say a hashing function can produce total X hashes. Now I sent it 10X inputs (I know there are computational limits, just for theory). I want to know if all hashes would be repeating roughly 10 times, so basically it would be equally likely to get each hash in the possible set every time? Or some hashes would be much more repetitively visible and some quite less? Following image to better describe what I mean. 
- 1$\begingroup$ I think this is more of a statistic question. All cryptographic hash have uniform distribution, where as it's difficult to calculate the bias, or to derive the distribution function of non-cryptographic digest algorithms such as CRC-32, etc. $\endgroup$DannyNiu– DannyNiu2024-12-23 09:17:46 +00:00Commented Dec 23, 2024 at 9:17
- $\begingroup$ I, too, realized that later, it basically asks how random a hash generation is and if we have some scale to rate hashing functions for that. As a result, it ended up focusing on likelihood ness of an event. $\endgroup$Amit– Amit2024-12-23 14:55:03 +00:00Commented Dec 23, 2024 at 14:55
1 Answer
It is very unlikely that a good cryptographic function will have the kind of flat empirical distribution that you are asking for, without some structural weaknesses, such as linearity.
Even when we model a hash function by a uniform distribution when we consider (say) throwing $m$ balls into $n$ bins independently and uniformly at random some deviations from a uniform distribution are highly likely.
The answer to the question here goes into a lot of detail about this.
The average "load" of each point will be 10 as soon as $X$ is cryptographicaly large, but with high probability the output with maximum load will have load roughly $$ \frac{m}{n}+\sqrt{\frac{2 m \ln n}{n}}=10 +\sqrt{20 \ln X}, $$ since $m=10X=10n.$