Is partial digest recognition possible?

Question

TL;DR

Does hashing a private key, then XOR[ing] the digest (twice recursively, with offset) on the secret have any glaring issues?

Description

User provides a private key 16+ chars
User provides a secret < 128 chars (for now)
We compute a digest of sha512(privateKey)
Scramble digest XOR secret based on ASCII values % length
Do this again with step #4 as the private key (XORs, etc.)
Decryption uses the same method(privateKey)
This will get a lot more complex, but not yet.

General Overview

I convinced my boss to entertain the idea of adding encrypted posts to our web application; with the caveat he wants something simple since this will be a gimmicky intro for most of our users (elementary school) and we will not re-work our backend so unfortunately no iv just a key

Again this won't be hiding critical information, perhaps classroom notes, in my mind it's just a cool feature which perhaps may spark future generations to take up interest in the field. Anyways, I came up with this code:

/** * Encrypt * encrypt text using given key */ async function encrypt(text, key, loop = true) { // step 0. validate input if (text.length > 128) throw "message too large (128 max)"; if (key.length < 16) throw "key size too small (16 min)"; // step 1. generate hash from they key const hash = await sha512(btoa(key)); const pads = hash.length - text.length; // step 2. convert to UInt8Array const ecode = new TextEncoder(); const codex = ecode.encode(hash); const array = ecode.encode(text); // step 3. offset the XORs const multiplier = text.length * key.length; const round = array.map((value, i) => { const index = ((i + 1) * multiplier) % codex.length; return value ^ codex[index]; }); // step 4. convert to string const dcode = new TextDecoder(); let value = dcode.decode(round); // step 5. add additional rounds if (loop === true) { value = encrypt(value, hash + key, false); } // step 6. return value as string return value; } /** * Decrypt * decrypt cipher using given key */ async function decrypt(text, key) { const encoder = new TextEncoder(); const decoder = new TextDecoder(); // step 1. generate hash from they key const hash = await sha512(btoa(key)); // step 2. convert to UInt8Array const codex = encoder.encode(hash); const array = encoder.encode(text); // step 3. xor the results const output = array.map((value, i) => { const index = ((i + 1) * value) % codex.length; return value ^ codex[index]; }); // step 4. convert to string const value = decoder.decode(output); // step 5. return results return value; } /** * Generate sha512 digest * https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest */ async function sha512(message) { const msgUint8 = new TextEncoder().encode(message); // encode as (utf-8) Uint8Array const hashBuffer = await crypto.subtle.digest("SHA-512", msgUint8); // hash the message const hashArray = Array.from(new Uint8Array(hashBuffer)); // convert buffer to byte array const hashHex = hashArray .map((b) => b.toString(16).padStart(2, "0")) .join(""); // convert bytes to hex string return hashHex; } /** * Test case */ async function main() { const key = "n,mWqrE@!#sS+Gbc$5"; const msg = "my namme is colin"; const out = await encrypt(msg, key); console.log({ out, len: out.length }); const txt = await encrypt(out, key); console.log({ txt, len: txt.length }); } main();

The decrypt function isn't used; encrypt works both ways (for now). What I am curious about is the information this leaks and what attack vectors potentially need to be mitigated. When I have more time I have plans to slap the FFT on this as well as some other tricks.

This can only be called once recursively
The output length is the same, but that's ok for now

I know it isn't secure, but how insecure?

Play around here!

Future Moves

The FFT works with polynomials and we can use the digit value plus position as position ** value to construct complex values.

Later the idea is to compress intermediate values to distort the length leak and transmission size as many times as possible [?].

These are just abstract thoughts, along with some others.

Note: this site is not for review of code and new cryptosystems. Also we prefer text description of algorithms, in particular to lower risk of misinterpretation (which occurred in my answer, now removed). So in retrospect, the question was not a perfect fit for migration. — fgrieu
– fgrieu ♦, Commented Jul 21, 2022 at 11:24
Tbh this was the perfect site, @fgrieu your answer was immaculate, but where else could I speak on this? — Asleepace
– Asleepace, Commented Jul 21, 2022 at 11:28
As far as I know, there is no perfect site for this. If you want to learn how to create a cipher (a problem solves hundreds of times over, with various amounts of success) you should first learn cryptography and understand the various well known attacks on them. It is not so different as having no knowledge about cars, and then asking a mechanic what they think of your push cart. — Maarten Bodewes
– Maarten Bodewes ♦, Commented Jul 22, 2022 at 0:50
@MaartenBodewes well the first step in my proof was being cocky on here, as the question says we need fast symmetric encryption without changing our db, as far as I'm concerned someone would've broken this for now for my bounty. — Asleepace
– Asleepace, Commented Jul 25, 2022 at 7:08

CBHacking · Accepted Answer · 2022-07-21 23:41:09Z

There are a lot of problems here, but I'll focus on ones that jumped out at me. The attempt to scramble the XOR ordering is doing basically nothing at all. First of all, it's almost 100% dependent on information that the attacker knows. Using the raw key length adds some entropy, but extremely little. Even leaving aside that the single fast hash is inadequate for passwords and therefore the "key" is probably a pre-hashed or random input of a known length, there are few enough possibilities that an attacker can simply try them all.

It is entirely possible for many, even most, codex values to never be used at all. For example, if the multiplier is even, then at least half of the codex is never used, since any even number shares a common factor (2) with 128. In fact, in the not-that-unlikely case that the multiplier is an exact multiple of the codex length - e.g. if the text is exactly 128 bytes long (the max), or the text is 56 (or some other multiple of 8) bytes long and the key is 16 bytes long (the shortest allowed value, and a very common length of random keys: 128 bits) - the result is disaster. Every single ((i + 1) * multiplier) % codex.length will return 0, so every single byte of the plaintext (array) will be XORed with the same value (the first byte of the hash digest).

Even in the case where the key length is totally unknown and the message length is not a multiple of 128, an attacker can probably find repeats more often that every 128 characters knowing only the message length. At most, the cycle of repeats will have a length of 128 / GCF(128, text.length), where GCF is the greatest common factor function. So if the text length is even, that cuts the cycle length in half (by necessity, cutting out half the codex, as mentioned above). If the text length is a multiple of 8, that cuts the cycle length by 8x (down to merely 16 at most), eliminating 87.5% of the codex entropy... and that's a best-case outcome, assuming the key length is relatively prime with the codex length (which is to say, assuming the key length is odd).

For each such repeat, an attacker who can predict the locations of the repeat can XOR any two corresponding bytes of ciphertext together to mask off the codex byte and get the XOR of the two plaintext bytes. If the attacker knows the value of any such byte, they can immediately decrypt all other locations where that codex entry was used. Even without knowing the values, it might be possible to perform a frequency analysis on the repeats, if you have enough of them, to determine likely plaintext values. Such an analysis would rely on the fact that certain characters occur more often than others, in predictable ratios, in text. The fact that the XOR between different characters would be known makes it even easier, and would compensate somewhat even if there were a low number of repeats for any given codex entry.

This sort of problem - where a failed attempt to reliably avoid one problem ended up introducing a potentially catastrophic failure mode - is very common in cryptography, especially among amateurs. Cryptography is very hard, especially if you don't even start from the known failures of past attempts.

With that said... I'm not even a number theorist or math major, much less a cryptographer; most of my math knowledge is merely what was required for a CS degree. Consequently I expect tons of people can and will spot this issue (and more knowledgeable ones will spot many, many more) if they bother to look.

But wait, there's more!

During the XOR step, you're combining each byte of the message (text) with a single byte of the codex (hash of the base64 of the key), to produce ciphertext. However, while the message bytes may by any value in the range 0-255 (0x0-0xFF), the codex is generated from a hex-encoded string. Each byte of the codex can only be one of 16 values (0-9,a-f, encoded as decimal 48-57,97-102). Thus, for every byte of ciphertext, there are only 16 possible bytes of plaintext (and the attacker knows what they are). This makes even simple brute-forcing short parts of the message possible; especially for commonly-used characters that are far from any other common character in the character set (such as space), regardless of your key or multiplier. This is a catastrophic weakness, enabling cryptanalysis of even relatively short messages that avoid the multiplier-introduced problems, without any form of "crib" (known plaintext or message format).

Some thoughts:

If you really want to do this, why not just use a stream cipher (with no IV/nonce or integrity) or a block cipher in ECB mode? Neither is safe, but they're much, much safer than this. All three approaches have the problem that two identical messages encrypted with the same key will have the same ciphertext. Your cipher shares another problem with the stream cipher (bit-flipping attacks are trivial) but even badly broken stream ciphers don't have the short cycle problem, much less the 4-bits-entropy-per-byte problem. ECB has the problem of leaking structure through repeats of blocks, but it only applies to blocks where all 16 bytes (if you use a modern block cipher like AES) are the same between block-boundary-aligned blocks.
Alternatively, why can't you store an IV and authentication/integrity tag? The messages are variable length to start with.

Simple design or code quality problems, that don't necessarily impact the security of the cipher:

The code will throw if the key contains any multi-byte characters, as atob doesn't allow those.
Why are you calling atob anyhow? It isn't needed.
The code may throw if the text contains any multi-byte characters (on the step where the ciphertext is decoded as UTF-8), as not all byte sequences are valid UTF-8.
You didn't encounter this problem in your testing because you never set or clear the most significant bit, and all byte sequences where the MSB of each byte is unset are valid UTF-8 (that's literally just ASCII), but you never state or validate this precondition.
Even if the code doesn't throw, if the text contains any characters that encode as multiple bytes in UTF-8, you might get the string length to appear within bounds (that is, text.length <= 128), but after encoding as UTF-8, the length of array could exceed 128 considerably (a single Unicode character could take 3, maybe more, UTF-8 bytes), guaranteeing the presence of repeated use of codex values even in the best case.
You compute a value, pads, and then never use it.
A cipher that only "works" on up to 128 bytes at a time is nigh-worthless unless it can be used in a block-cipher mode of operation. Presumably this limit is to try to avoid repeats of the codex value, but at what cost? You say classroom notes, but this can't even hold a tweet.
A cipher that can't do any work until it knows the final message length is completely unsuited to use anywhere performance is needed, since that is achieved through streaming data. Of course, the previous point largely renders it moot.

Well, all I can say is that's why I came here to post that and hear this, thank you! Probably should've went with my instinct and added more primes in the process, this is a rough draft so thanks for this: assuming the key length is relatively prime. Still not decrypting this tho lol I'll give you $300: "cbc12bi;9". And wait even if what you are saying is true and I managed to only change a single digit within my sha-512` digest, this is a weakness in hashing no? — Asleepace
– Asleepace, Commented Jul 21, 2022 at 11:13
Also without the secret text how are you computing const multiplier = text.length * key.length; you can only go reverse (decrypt), but without even a known 1 here it would leave you guessing on the shift alone no? The future moves will erase a lot of the frequency, statistical analysis, albeit still theoretical. I know I'm sounding like a drunk dick, but where else can I argue with people who know more than me? Assumption in my code is two fold: sha-512 there are no vulnerabilities, so if I change just one of those digits is that a problem with me or the hash? — Asleepace
– Asleepace, Commented Jul 21, 2022 at 11:23
@Asleepace Please read again this answer. Your code cannot be salvaged, you'll just add more issues. Use a real encryption algorithm instead. Also, please don't be a Dave: security.stackexchange.com/questions/25585/… — A. Hersean
– A. Hersean, Commented Jul 21, 2022 at 12:00
@Asleepace: None of the issues discussed in this answer is related to SHA-512 properties (save for its fixed output size). — fgrieu
– fgrieu ♦, Commented Jul 21, 2022 at 12:01
I'm agreeing with about everything in the answer, but the recommendation to use ECB or CTR mode is not great. Maybe some form of Format Preserving Encryption (FPE) would be a better fit. If a 16 byte tag is not an issue - then modes that use a synthetic IV (AES-SIV or AES-GCM-SIV) would be the prime candidates - those ciphers would not leak any information except for the message size (of course) and the fact that the plaintext is fully identical or not. — Maarten Bodewes
– Maarten Bodewes ♦, Commented Jul 22, 2022 at 0:37

Stack Exchange Network

Is partial digest recognition possible?

TL;DR

Description

General Overview

Future Moves

1 Answer 1

Hot Network Questions

Is partial digest recognition possible?

TL;DR

Description

General Overview

Future Moves

1 Answer 1

Related

Hot Network Questions