9

I have a theoretical question regarding the comparison of password-based encryption and password hashing. Not sure if Stackoverflow or crypto is the best place, but this is more on the side of programming and API/database/application security, so figuring here works best.

First, let me define the two. These are my current understandings, and if my assumptions are wrong, please correct me.

  • Password-based encryption: Uses a password-based key derivation function (PBKDF) to stretch a password into an encryption key (examples are Argon2, PBKDF2, bcrypt, scrypt). This key is then used for encryption/decryption schemes (AES, or other algorithm).
  • Password hashing: Uses a password in a hashing function (similarly: Argon2 PBKDF2, bcrypt, scrypt, or even more insecure password hashing functions like SHA256). Passwords are validated by hashing and comparing outputs.

Now, for most standard applications, the typical accepted practice is something as follows:

  1. User creates an account (email, password)
  2. The password is hashed (let’s say using Argon2) and this is stored in the database (the used salt is stored as part of the Argon2 hash output).
  3. Upon login, the user’s entered password is compared to the stored hash. Specifically, the hash is retrieved from the database, salt is extracted (and reused) and then hashing is performed using said salt and password to see if the outputs match. If they do, the login is a success.

My question stems from a different approach to this traditional user authentication implementation using password-based encryption.

Let’s define a database schema as follows.

| email(varchar) | salt (varchar) | data(blob) |

Now, let’s reimagine the authentication flow:

  1. User creates an account (email, password)
  2. The password is turned into a key using PBKDF (once again, Argon2 in this scenario). The salt used for this key derivation is stored in the database along with the user’s email.
  3. The generated key is used to encrypt some arbitrary data using a stream cipher (so we do not need to worry about padding) such as ChaCha20. This arbitrary data could be something like JSON containing user data or just a plaintext message. The resulting encryption ciphertext is stored in data along with user email and salt.
  4. Upon login, the user’s salt is extracted and used in conjunction with Argon2 to regenerate the key and decrypt the data. If decryption is successful, we have now validated the user’s password.

In this scenario, there are a couple of positive implications that I see:

  • Encryption at rest: If this is necessary for your application, you can just store your user attributes in the data column. If it isn’t necessary, then you can have columns separate from email, salt, and data.
  • Breach protection: If the database is breached (say through some web application logic vulnerability like SQL injection), then there is no hash to crack with tools like Hashcat. Instead, it’s an encrypted blob and the parameters to get there are hidden in server logic (is it Argon2? Bcrypt? AES? ChaCha20). The complexity and time required to try and figure out is not feasible. Compared to typical hash storage which is pretty trivial to determine the logic used (example Argon2 hash: $argon2id$v=19$m=65536,t=3,p=4$JHO1a6fFZyLFgdZ10BBdLw$2bXgltb0IC6JPz02Vb2Gn7DHTzHEdO5v1Zj2fK6lFDw

Of course, I do see a potential implication:

  • Performance? Adds some overhead with encryption/decryption, but I feel on modern systems this is negligible.

But overall, from this (albeit basic) scenario, it seems like the password-based encryption scenario is more secure.

With all of that laid out, I’m interested to know how this approach sounds from an actual application security perspective. Any assumptions that I made that are incorrect? Security & performance implications? Pros & cons of one approach vs. the other? If it is truly more secure, why do more authentication frameworks not take this approach?

Like I said in the preface, I know this is a bit crypto heavy but I think this errs more on the application side of crypto rather than the theory.

6
  • 1
    Keep in mind that your system relies on the secrecy of the derived key. If the key derivation parameters (salt and algorithm) are discovered, the security of the entire system could be compromised. In traditional hashing, even if the hashes are stolen, users can reset their passwords without needing to worry about the hashes being reversible. Commented Feb 25 at 2:28
  • 6
    if 'encryption at rest' is not a requirement for the application, then the only benefit to this scheme is 'breach protection'. The breach protection methodology relies on the algorithm used to create the encrypted blob being hidden in server logic. This is basically 'security by obscurity', which is generally frowned upon by most security experts. Commented Feb 25 at 2:56
  • 1
    @mti2935 - I would argue that this would be more defense in depth rather than security through obscurity. Security through obscurity would imply that if that obscurity gets discovered, the security goes away. If the server-side implementation is revealed (attacker discovers PBKDF & encryption scheme), the data is not insecure. An attacker still needs to crack the password, which then at this point is no different than storing just a hash. No security is lost through the unobfuscation of the authentication scheme. Just makes it a bit harder for an attacker to get everything they need. Commented Feb 25 at 3:52
  • 5
    @LandonCrabtree: What you’ve just described is security through obscurity. Instead of giving a concrete technical justification for your scheme, you just hope it will somehow confuse attackers. It’s also overly optimistic to think your scheme will be at least as strong as classical Argon2. This may be the case if you don’t make any mistakes, but how do you know that when you cannot publish your implementation and let everybody review it? Cryptography doesn’t always fail gracefully. Even seemingly minor implementation mistakes can lead to significant weaknesses. Commented Feb 25 at 10:33
  • 1
    Note that at least one existing service uses a similar scheme: StartMail Technical white paper, User Vault Commented Feb 25 at 15:19

2 Answers 2

13

I can think of a few shortcomings on the "password encryption" approach:

  • You have no way to change any data on the user server-side. For instance, let's say the user should be revoked somehow. If all you have is the encrypted blob, the user has to login and give you the password to revoke them. You would start moving user settings out of that JSON file, and end up having to maintain an opaque blob and a transparent one.

  • You cannot have a "forgot password" function: because all relevant user data is protected by that password, a forgotten password is definitive. Lost the password, lost the account.

  • You cannot query your user database for anything relevant. You cannot say from which countries your users are, for instance. Anything on that blob is invisible for you. To have data, you would have to store some data somewhere, invalidating the premise.

  • Performance penalty will not be negligible. You must not think on "works on my computer" but "how does this scale to thousands of users?"

    Argon2 is CPU-heavy. Add it to AES encryption and the overhead is even larger. Add a couple thousand users and you can suffer from a self inflicted DoS, or an attacker can just throw random passwords at your system and take it down.

If decryption is successful, we have now validated the user’s password.

What if this JSON blob is large? What if it contains a picture of the user, and somehow it's a 2048x2048px 32-bit BMP? You will have to decrypt several megabytes of data just to know if the password is correct.

If the database is breached (say through some web application logic vulnerability like SQL injection), then there is no hash to crack with tools like Hashcat.

Hashcat won't work, but plain old bruteforce will. As a large majority of users don't have strong, unique passwords, an attacker will just throw "the list of worst passwords" at the dataset and recover more than half of the passwords.

But overall, from this (albeit basic) scenario, it seems like the password-based encryption scenario is more secure.

No, it's exact the same thing, but with more steps. Salt plus hashing with Argon2 and storing the hash is exact the same operation as salt plus hashing with Argon2 to create an encryption key. Because the security of the entire operation lies on that Argon2 output (that is the same on both cases), and encryption is reversible by definition, using AES or Base64 or ROT13 would be the same. Everything after getting the output of Argon2 can only reduce security, not add to it.

I would use the right tool for the job: salt plus hash for password checks, PBKDF derived from the password to create a key-encryption key (KEK) used to encrypt a data-encryption key (DEK). See Envelope Encryption for details.

4
  • 1
    Thank you, this is a good analysis. Basically any user data we want to retrieve would (a) require the user password, again (b) have to be stored unencrypted. For the sake of UX (like your user profile picture point & forgot password), it makes this approach pretty poor. Would require, like you mentioned, unencrypted data. At that point, the encryption at rest benefit sort of goes away and the performance overhead wouldn’t be worth it just for password comparisons. Commented Feb 25 at 3:57
  • @LandonCrabtree your approach isn't entirely useless though if you have data that a) you (the service provider) should not have access to (think E2E encryption, personal data) and b) needs to be available on every log-in. Commented Feb 25 at 10:06
  • 2
    @Hobbamok Protecting data at rest is definitely useful, but proper E2E encryption requires that only the (end) user is able to read the unencrypted data. In the scheme described here, the key material (the password) leaves the trusted "end", i.e. the device/system of the user and the server actually "sees" the unencrypted data. If you fully trust the server/service provider, you could maybe argue it is close/similar to E2E encryption, but I would recommend not to call it that (because it creates a potentially false sense of security). Commented Feb 25 at 14:01
  • @SimonLehmann yeah, E2E was a bad example of what I meant. Commented Feb 25 at 14:06
10

What you describe is largely security through obscurity: You assume that using a nonstandard approach and “hiding” the implementation details will frustrate attackers and make them give up. This is a poor and rather naive approach, especially in cryptography. “Hiding” cryptographic algorithms was tried many times in the past, and it keeps failing (for example, RC4, A5/1 and A5/2 were all leaked at some point). Even worse, not publishing an algorithm means that you don't get feedback from legitimate reviewers like security researchers, cryptographers or programmers. So you risk that your mistakes will go unnoticed until an attacker exploits them. And they will be discovered and exploited as soon as a competent attacker thinks it's worthwhile to take a closer look. This is why there's now a broad consensus that cryptographic algorithms should be completely open and not depend on any secrets (except for the key, of course). Many algorithms were even chosen as part of an open competition where the candidates are publicly assessed, discussed, tweaked and either accepted or rejected.

If you take away the obscurity aspect, not much is left of your proposal. Argon2 followed by encryption/decryption doesn't provide any better brute-force protection than Argon2 itself. In fact, it makes no sense to use an extremely efficient cipher like ChaCha20 for this purpose, because it provides virtually no slowdown effect. You're much better off just tweaking the Argon2 parameters.

If you want to encrypt additional data with the password, this is valid, but you don't need any tricks for this. You can, for example, simply generate twice the amount of data with Argon2, so that you can use the first half as a password hash and the second half as a key. Older algorithms like bcrypt with a fixed-length output can be combined with HKDF to produce enough data.

4
  • Hypothetically, if I was to create an encryption algorithm, and then invite the best cryptographers in the world to review it at an undisclosed underground facility, and after they all agree it is secure I was to have them executed, would security through obscurity work? Commented Feb 27 at 1:28
  • 1
    @suchislife: If the cryptographers agree that your algorithm is secure, then what do you need obscurity for? Shouldn’t the algorithm work just fine even if it’s publicly known? Security through obscurity is typically used when you aren’t convinced that an algorithm/protocol/application/system is secure and now try to hide its weaknesses. I know some people (like the OP) justify security through obscurity as a defense-in-depth mechanism. But I’m very skeptical of this. In reality, you cannot simply keep ideas away from everybody else. Commented Feb 27 at 2:13
  • I just thought: What would Lex Luthor do? Commented Feb 27 at 2:23
  • @Ja1024 - I know I'm a bit late to the response, but just to clarify: I am not trying to justify security through obscurity as a defense-in-depth mechanism-- I am trying to separate the two to avoid conflation. In my opinion, my approach is not STO as it's not inventing a new encryption scheme, hashing scheme, etc. STO is something like XORing a password with the reverse of the plaintext, and knowing that defeats the entire scheme. With this hypothetical auth scheme, even if the attacker knew the "hidden parameters", it doesn't defeat or break the scheme. Argon, ChaCha, etc are still secure Commented Mar 9 at 17:37

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.