Skip to main content
2 of 6
added 100 characters in body
caveman
  • 721
  • 3
  • 15

How to estimate the maximum computational cost bound for Key Derivation Functions (KDFs) before it becomes useless security-wise?

From my understanding of Key Derivation Functions (KDFs), e.g. scrypt, Argon2, etc, we can tune their parameters such that it eventually becomes harder for an attacker to brute force a password-to-key through them. At this point, the attacker may directly brute force the, say AES128, key.

It is nice to not over-tune the parameters of KDFs, so that the user is not needlessly suffering using a slow application. I think it is ideal if the KDF is tuned only so that the user suffers least while still maximum possible security out of, say, AES128-CBC (or whatever other symmetric cipher).

An easy way is to explore all improvements in hardware and algorithm design, in order to get an estimation how long would it take certain well-funded organisations have to wait until they finally manage to decrypt my cipers. But I think this approach is needless complex as I think we can probably say a lot about the computational bounds of KDFs by simply studying the problem from an information theoretic perspective.

Below is my attempt (which contains ideas of other people firm other networks), for the goal of helping you help me better. Plus there is a question at the end of this post to make it clearer.


What I have done so far:

1. Converting number of entropy bits to number of guesses

When the answer to a question has $b$ bits of entropy, it means that the best model has $2^{b}$ equally-likely possible answers to pick from, and that the best model can only follow a brute-forcing model (no shortcuts to cheat around this). It goes like this:

  • The best possible model that is tasked to answer the question, by submitting a single answer, has $\frac{1}{2^{b}}$ probability of finding the correct answer.

  • The best possible model that is tasked to answer the question, by submitting $2$ answers, has $\frac{1}{2^{b} - (2-1)}$ probability of finding the correct answer.

  • The best possible model that is tasked to answer the question, by submitting $3$ answers, has $\frac{1}{2^{b} - (3-1)}$ probability of finding the correct answer.

  • The best possible model that is tasked to answer the question, by submitting $n$ answers, has $\frac{1}{2^{b} - (n-1)}$ probability of finding the correct answer.

So, how many guesses should the best possible model make in order to achieve a probability of $1$? We need find $n$ that satisfies this: $$ \frac{1}{2^{b} - (n-1)} = 1 $$ which means $n = 2^{b}$.

2. Converting number of guesses to cost

Let's say that $c_{i,h}$ is the cost of the model testing its $i^{th}$ guess (to see if it is the true answer) by using a program-hardware combination named $h$.

Then, since we have $n$ many guesses to make in order to guarantee finding the answer, the total cost is $\sum_{i=1}^n c_{i,h}$.

In password/key brute forcing, the average cost of testing a guess is constant across the guesses, i.e. $c_{1,h} = c_{2,h} = \ldots = c_{n,h}$, let's drop the attempt's number's subscript and call it just $c_h$. So: $$ \sum_{i=1}^n c_{h} = nc_{h} $$ is our total cost when using the program-hardware combination named $h$.

So now we basically have a function/mapping from number of guesses to cost on program-hardware $h$. Let's call it function $f_1$: $$ f_1(n, c_h) = nc_h $$

3. Converting number of entropy bits to cost

If we have $b$ many entropy bits, we know that number of guesses is $n=2^b$, so, we can modify $f_1$ into an entropy-to-cost on $h$ mapping, let's call it $f_2$:

$$ f_2(b, c_h) = b^2c_h = nc_h = f_1(n,c_h) $$

4. Consequences

Let's say that $h_{aes128}$ is the program-hardware combination that implements some 128 bit AES encryption.

A user chose a password which, when perfectly compressed (using a theoretical perfect lossless compressor), contains, say, $70$ bits of entropy. let's call this password $p70$.

The user then used a key derivation function, $k$, to (practically) make it seem to the adversary as if his $70$ bits password is a $128$ bits password. In other words, from the perspective of the adversary, the key derivation function is injecting entropy into user's password to create a richer password with more entropy, let's call this $p128 = k(p70)$.

Obviously, this key derivation function, $k$, is implemented as program-hardware combination different than $h_{aes128}$, so let's give its own name: $h_k$.

Finally, the user used $p128$ and $h_{aes128}$ to encrypt a bunch of clear text blocks $x_1, x_2, x_3$ into a bunch cipher text blocks $y_1, y_2, y_3$. He did something like $(y_1,y_2,y_3) = h_{aes128}(p128, x_1, x_2, x_3)$ (just like a typical symmetric encryption).

Now, an adversary got his hands the cipher-text blocks $x_1,x_2,x_3$. The adversary knows $y_1$, but doesn't know $y_2$, $y_3$. His objective is to find the key $p128$ in order to decrypt $x_2$ and $x_3$ into $y_2$, $y_3$.

Right now, all the adversary knows is that most likely the original password had less entropy than $128$ bits (because this is common sense, most users hardly pick any better). Based on this educated guess, the adversary creates these password attempts: $a_1, a_2, \ldots, a_m$. now the adversary has 2 options:

  • Option 1: try each attempt $a_i$ with key $k(a_i)$ to decrypt messages. something like $(\hat y_1, \hat y_2, \hat y_3) = h_{aes128}(k(a_i), x_1, x_2, x_3)$, then test if $\hat y_1 = y_1$, and if true assume with high probability that $(\hat y_1, \hat y_2, \hat y_3) = (y_1, y_2, y_3)$, i.e. he found the clear texts.

    In this case, the total cost for the attacker is $f_1(m, h_k) + f_1(m, h_{aes128})$.

  • Option 2: or, he may try each $a_i$ such that it is a $128$ bit key, and avoid $k$ altogether. Something like: $(\hat y_1, \hat y_2, \hat y_3) = h_{aes128}(a_i, x_1, x_2, x_3)$.

    This way, his cost is $f_2(128, h_{aes128})$.

Obviously, the adversary will choose whichever path is easier to him. meaning:

  • If $f_1(m, h_k) + f_1(m, h_{aes128}) < f_2(128, h_{aes128})$, he will choose option 1.
  • Else he will choose option 2.

Meaning, there is no point in making the function $k$ more expensive than $h_{aes128}$, because the attacker will ignore option 1 and move to option 2, hence the user only gained extra suffering from option 1.

Meaning, ideally, the costs must be equal. i.e.: $$ f_1(m, h_k) + f_1(m, h_{aes128}) = f_2(128, h_{aes128}) $$ because in such a case, the user has minimal suffering without giving the adversary any shorter path to decrypt the ciphers.

Let's expand the equation and see where it goes: $$\begin{split} f_1(m, h_k) + f_1(m, h_{aes128}) &= f_2(128, h_{aes128}) \\ mh_k + mh_{aes128} &= 2^{128}h_{aes128} \\ m(h_k + h_{aes128}) &= 2^{128}h_{aes128} \\ h_k + h_{aes128} &= \frac{2^{128}h_{aes128}}{m} \\ h_k &= \frac{2^{128}h_{aes128}}{m} - h_{aes128}\\ h_k &= \frac{2^{128}h_{aes128}}{m} - \frac{mh_{aes128}}{m}\\ h_k &= \frac{2^{128}h_{aes128} - mh_{aes128}}{m}\\ h_k &= \frac{h_{aes128} (2^{128} - m)}{m}\\ \end{split}$$

4.1 Identifying a crisp computational upper bound limit of KDF without losing any security

If user's password, $p70$ was a long sentence that contained $128$ bits instead of $70$ bits, then we can tell whether a key derivation function is perfect purely by information theoretic properties, independent of any hardware knowledge of the adversary. Here is how:

  • By definition, since the user has inputted a password with $128$ bits in this case, it has to be that adversary's only realistic hope is to assume that $m=2^{128}$.
  • Testing if key derivation function does not delete any information. So $k(p70)$ is still $128$ information bits, so that AES128 gets a key with $128$ bits of entropy). This test can be done purly on statistical/information theoretical basis, independent of hardware design improvements.

Then by substituting $m=2^{128}$in the last equation, we get: $$\begin{split} h_k &= \frac{h_{aes128} (2^{128} - m)}{m}\\ &= \frac{h_{aes128} (2^{128} - 2^{128})}{2^{128}}\\ &= \frac{h_{aes128} (0)}{2^{128}} = \frac{0}{2^{128}} = 0 \\ \end{split}$$

Meaning, the key derivation function should have $0$ CPU cycles in it in order to be optimum: no user suffering while maintaining the confidentiality guaranteed to be maximum for AES128, regardless of user's or adversary's software/hardware implementation.

Obviously $h_k=0$ is impossible in practice, since it will consume some unit of time (such as cpu cycles). But knowing the fact that $h_k=0$ is optimum is enough reason for us to simply seek the simplest implementation of the key derivation function in this scenario, and still guarantee that we are as secure as AES128 can ever be.

4.2 Identifying an approximate computational upper bound limit of KDF without losing any security

Similar to 4.1 except for being realistic: the user's password $p70$ does not have $128$ bits of entropy, but less, such as $70$ bits. And even though the adversary knows that the key derivation function has made the key look like having $128$ bits of entropy, the adversary still knows that the user might be having a lower entropy password.

  • We need to test the key derivation function statistically to see that its output is behaving as if $p128 = k(p70)$ does contain $128$ bit of entropy. this test is done on statistical/information theoretical basis (exactly as in 4.1, independent of hardware implementations of any algorithm).
  • The user measures the entropy of his own password ($p70$). This is also doable in the same way as the point earlier, and if he finds it to have $70$ bits, then it means that $m=2^{70}$.
  • Suppose that the adversary somehow predicted that the original password $p70$ had $70$ bits of entropy.

Then by substituting $m=2^{70}$in the last equation, we get: $$\begin{split} h_k &= \frac{h_{aes128} (2^{128} - m)}{m}\\ &= \frac{h_{aes128} (2^{128} - 2^{70})}{2^{70}}\\ &= h_{aes128} \times 288230376151711744\\ \end{split}$$

Here we don't know how low can the adversary squeeze $h_k$ and $h_{aes128}$, so we cannot be definitive. But at least we can get some approximate guidelines in order to fully to achieve aes128's maximum security.

The general guideline that we obtain here is that our estimation of how fast $h_k$ runs on our adversary's fancy hardware (e.g. ASICs, parallelism, gpu, etc) should be $~288230$ billion many times more expensive than our estimation of how adversar's implementation of $h_{aes128}$.

Of course, we can afford making $h_k$ very slow, because we only have to derive 1 key out of it. But the adversary has to repeat this for $2^{70}$ times, effectively becoming as difficult as attacking aes128's $128$-bit-entropy keys directly!

The nice thing about this number is that the user can be warned by the key derivation system that he better choose a better password, unless he wants to end up using expensive parameters for his key derivation function that makes its single run $~288230$ billion times slower than adversary's estimated single run of aes128 implementation.

Here I made an implementation of this idea.

Question: Can we make this any tighter? Less dependent on estimating adversary's hardware? or allowing to lax that estimation further?

caveman
  • 721
  • 3
  • 15