13
$\begingroup$

What happens to entropy after hashing?

Suppose you have a key with entropy $k$. Can entropy $k$ be increased by hashing the key?

$\endgroup$
2
  • $\begingroup$ Hmm ... do you mean any "hashing" (e.g. how about $H(k) = 0$), do you mean entropy of $k$ or $H(k)$, do you mean increase or decrease? $\endgroup$ Commented Dec 21, 2013 at 8:10
  • $\begingroup$ @Drux I meant the entropy of H(K). I want to increase. $\endgroup$ Commented Dec 21, 2013 at 8:50

3 Answers 3

23
$\begingroup$

The answer rather depends on what you mean by 'entropy'; if you mean 'Shannon Entropy', then no, a deterministic function cannot increase entropy.

For example, if the unhashed password has only 7 different possible values, then the hashed version of the password will also have (at most) 7 different possible values; you've made things look more obscure, but you haven't actually changed anything as far as entropy is concerned.

The standard definition of Shannon entropy is:

$H(X) = \sum -p_i\ log_2 p_i$

Where $p_i$ is the probability of the $i$th possible sample value. That is, the Shannon entropy is defined solely in terms of probability distribution, and not how that probability distribution appears.

If we have a Hash function $SHA$ which doesn't have any collisions, then it has no effect on entropy; that is, $H(X) = H(SHA(X))$; that's because the probabilities involved in the probability distributions are precisely the same; if a specific password $x_i$ had a certain probability, then the hashed form of that password $SHA(x_i)$ will appear with that exact probability in the hashed distribution.

And, if the Hash function $SHA$ does have a collision, that is, if $SHA(x_i) = SHA(x_j)$, then (assuming both $x_i$ and $x_j$ have nonzero probability), then $SHA$ will actually reduce the entropy. And, this is not a violation of collision resistance; collision resistance only states that it is hard to find such a collision, not that one does not exist.

This is easier to see in the extreme case; suppose we have two possible passwords ("Mary" and "Susan"), and both of those passwords have probability $0.5$; this means that the entropy of the password is 1 bit; an attacker can perform go through all possible values by checking on two different passwords. If you hash the password, you may come up with two images (39e74b4e80ba656f773669ed60315a and 98e15403b2b1ea5022fc42b3490cc76e in hex) that look different, however there are still only two possible values, and so the effort an attacker would have in trying those two values in unchanged.

$\endgroup$
5
  • $\begingroup$ Oh dear information theory... [+1] (and thanks for correcting me) $\endgroup$ Commented Dec 21, 2013 at 11:23
  • $\begingroup$ I think the question makes sense only in the case of entropy computed at byte or character level, and then E(H(X)) is not equal to E(X) - if X represents the password. in fact studying the numerical function f(x) defined by f(x)=E(H(X))/E(X); x=E(X) is in itself interesting, what sort of properties has that function? Can hashes be somewhat be characterized by some properties of that function etc... $\endgroup$ Commented Mar 7, 2015 at 23:54
  • $\begingroup$ @KarlZorn: I have no idea what you mean by 'entropy computed at byte or character level'. In addition, as how E(H(X))/E(X) acts depends a great deal on the precise distribution X (for example, consider a distribution which is defined such that only values $x$ with $H(x)=0$ have nonzero probabilities), it's not clear that your $f(x)$ is actually well-defined. $\endgroup$ Commented Mar 9, 2015 at 12:26
  • $\begingroup$ I mean for example entropy defined as in (troydhanson.github.io/misc/Entropy.html) and of course , values of X such as H(X)=0 are excluded from definition. bytes streams X such that H(X)=0 are easy to determine. The whole point is that in that case , intuitively, we would consider that f(x)=E(H(X))/E(X) , x=E(X), x>0 , would be such that f(x) is 'often' greater than 1 or 'around' 1. $\endgroup$ Commented Mar 9, 2015 at 13:09
  • $\begingroup$ now it is possible that if we get a random uniform generator for samples of X then the values of f(x) shall be distributed around a normal gaussian curve centered at 1, which would join the fact that 'Shannon entropy' stays the same. In fact I am looking for papers about these topics. $\endgroup$ Commented Mar 9, 2015 at 13:19
-2
$\begingroup$

If the hash function is collision resistant, hashing does not change the entropy of the key. It will be $k$ after hashing.

$\endgroup$
4
  • $\begingroup$ This is not correct. Consider a completely random 2n-bit key, i.e., it has min-entropy 2n. And consider a hash function of the form $\{0,1\}^{2n}\rightarrow \{0,1\}^n$. Even if the hash function is a perfect random function. The result has just n-bits of entropy. $\endgroup$ Commented Dec 21, 2013 at 11:32
  • 1
    $\begingroup$ @acerberus In that case what you are saying is true. I did not consider that case. thank you for pointing out that. $\endgroup$ Commented Dec 21, 2013 at 11:59
  • 1
    $\begingroup$ even for a random transformation of n bits to n bits, there will probably be some collisions. but the entropy won't decrease significantly. It does tend to decrease a little bit, however. $\endgroup$ Commented Jan 25, 2014 at 20:47
  • $\begingroup$ @sellibitze If the collision probability for a single step is (a single) constant, the collision probability increases exponentially, does it not? $\endgroup$ Commented Jan 26, 2017 at 12:57
-2
$\begingroup$

This is just an aside but I found it interesting to consider.

If I have a key $k$ with 128 bits of entropy, and I hash it twice with two different constants: $$x = hash(k + A), y = hash(k + B)$$

then the resulting two 'parts' $x,y$ will each have 64 bits of entropy.

This is because we cant increase the entropy, as described above, e.g. we don't suddenly have 256 bits, so the entropy must be 'shared' between the two parts... Somehow I find this both intuitive and non-intuitive at the same time!

Adding this here just in case it's useful for anyone else.

$\endgroup$
7
  • 4
    $\begingroup$ "he resulting two 'parts' $x,y$ will each have 64 bits of entropy."; no, each one individually (assume a good hash function) will have circa 128 bits of entropy; however jointly, they will have 128 bits. That's because the values of $x, y$ are correlated (even if it may be computationally difficult to reconstruct the correlation, or to compute $y$ from an example $x$ value); from an informational theoretic standpoint, it is little different from the construction $x = \text{hash}(k); y = \text{hash}(k) + 1$ $\endgroup$ Commented Aug 2, 2022 at 11:43
  • $\begingroup$ Thanks. I'm confused as to how they can each have 128 bits, but both together also have 128 bits. I understand what you mean re. correlation, but doesn't the latter argument imply that x (given y) has 64 bits, and y (given x) has 64 bits? Cheers $\endgroup$ Commented Aug 3, 2022 at 15:20
  • 1
    $\begingroup$ No, x (given y) has close to 0 bits - a computationally unbounded entity could iterate through all possible values of the key, determine which keys give that known y value (and likely there'll be only one), and with those keys, compute the possible range of x values (of which likely there'll be only one). Now, this is infeasible for us (being computationally bounded); however entropy is an informational theoretic concept, and bounds on computation don't apply to it. $\endgroup$ Commented Aug 3, 2022 at 15:38
  • $\begingroup$ Thanks for the clear explanation. How does one solve the apparent conflict, x = 128 bits, y = 128 bits, x and y together = 128 bits? $\endgroup$ Commented Aug 5, 2022 at 8:47
  • $\begingroup$ What conflict? Consider the similar case I mentioned earlier; $x = \text{hash}(k), y = \text{hash}(k) + 1$; both $x$ and $y$ can individually have 128 bits of entropy, and together, they have 128 bits of entropy. Is there a conflict there? If not, wouldn't the same reasoning apply to the original case? $\endgroup$ Commented Aug 5, 2022 at 12:02

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.