Understanding how SHA-512 achieves its design goals

Question

The Wikipedia entry on SHA-2 contains a usable pseudocode recipe. In the hope of some deeper understanding, I implemented SHA-256 and SHA-512 from it. This was helpful, but I still don't think I have joined together my understanding of how the hashing works, and how it achieves its goals.

In short, I can see some non-linear bit-twiddling and likely chaotic behaviour from the components that SHA-2 strings together, but I'm missing how the whole algorithm results in provably achieving the avalanche effect or anything similar.

In discussions here, posters tend to treat the hash output as essentially a random map from any set of input messages whether they have any correlations or not (e.g. they could be random messages to hash, or they could be from a simple series such as integers from 1 to 1000). I'd like to know whether and how this property of SHA-2 is proven or verified.

Here is my understanding so far:

The initial hash data and round constants, derived from square roots and cube roots of prime numbers, are not relevant. The numbers are chosen this way to demonstrate lack of bias (or hidden backdoors) by the designer. You could equally use the binary expansion of $\pi$ sequentially in each constant.
The main design purpose I see for the hash and round constants is that they will push the non-linear repeated rotations into a more chaotic mode when they are added in; and in combination they basically ensure there is no single input word value that has simple short loops when repeatedly calculating e.g. Sum $s0 = (a \ggg 28) \oplus (a \ggg 34) \oplus (a \ggg 39).$ in each round.
The sums in the round and used to initialise the workspace seem have the effect of "spreading" message bit values throughout each word (either in the workspace, or one of the hash sub-components), and as far as I can see the specific values of amount of rotation (28, 34, 39 in the example) are not highly important, other than each value is different between sums/sigmas and they do not form short loops modulo either each other or with the word size.
I can see that some important pieces of feedback in addition in the rounds (e.g. to create temp1 the value of h is used - if it were not used somehow for either temp1 or temp2, its value would become irrelevant). But in general I have no clue why the results from the sums are combined in the specific way given. For instance, why is e negated as part of creating ch?

I do not expect to go from this level of understanding to full comprehension in one step. I'm also not 100% sure about any of the above. In general, I can see how SHA-2 works its magic in a hand-waving way - I might say "it is a complex system built using a few base algorithms with chaotic behaviour interacting". But I am stuck getting to the next level.

How do I find out why the specific recipes for e.g. ch and maj are used (the names chosen in the Wikipedia article imply they have meaning in the design)?

What form does the proof that SHA-2 does what it intends to do take?

The big picture should include at its top that SHA-2 is built using the Merkle–Damgård structure, from a One-way compression function itself build using the Davies-Meyer structure from a specialized block cipher. The arbitrary initial hash data is instrumental in the derivation of the security of the Merkle–Damgård construction (making SHA-2 a random instance of a hash familly), while the round constants are part of the block cipher design. — fgrieu
– fgrieu ♦, Commented May 2, 2014 at 15:40

forest · Accepted Answer · 2018-05-26 02:50:54Z

SHA-2, like SHA-1, is an ARX hash function: that is, it uses Addition, Rotation, and eXclusive-or for bit diffusion. The purpose of each one is explained very simply and clearly by Khovratovich & Nikolić in their paper Rotational Cryptanalysis of ARX, so I will simply quote them here:

Addition provides diffusion and nonlinearity, while XOR does not. Although the diffusion is relatively slow, it is compensated by a low price of addition in both software and hardware, so primitives with a relatively high number of additions (tens per byte) are still fast. The intraword rotation removes disbalance between left and right bits (introduced by addition) and speeds up the diffusion.

So there is no "chaotic behavior interacting", but there is bit-diffusion, and that is what you want in a hash. Sufficient bit-diffusion is called an "avalanche effect". The message expansion mixes the bits between words, and the actual compression function is where the nonlinearity comes in, using these pre-mixed bits.

As to the second part of your question, $\operatorname{ch}(a,b,c)$ is the "choice" function: when $a$ is $\text{true}$, the result is equivalent to $b$, and when $a$ is $\text{false}$, the result is equivalent to $c$. So $a$ acts as a selector between either $b$ or $c$. $\operatorname{maj}(a,b,c)$ is the "majority" function, which takes the majority value as the final result. So if two or three of the variables are $1$, then the result is $1$; otherwise, the result is $0$. And usually there is also the $\operatorname{par}$ function, which is "parity" and uses XOR between the operands.

The truth density of each of these functions is 50%. In other words, given completely random values for $a$, $b$, and $c$, you have a 50% chance of getting a $1$ or $0$ result.

No, nothing to do with modular arithmetic. Let's say I told you the number "9124124", and said I'd added 4 unsigned numbers to get it. What were those numbers? There is no direct relationship that can give you any confidence in your answer. Addition involves carries, which means that the actual answer for the MSB must take into account all the carries that have occurred since the LSB. That's where the nonlinearity comes in. — Rhyme
– Rhyme, Commented May 9, 2014 at 8:46
A critic: that's a bit like explaining a CPU by discussing CMOS transistors only. This answer is currently only about the principle used in the lower layers of SHA-512. It does not consider A) the overall Merkle–Damgård structure; B) the Davies-Meyer construction of the compression function from a block-cipher; C) that cipher's overall construction (in particular: why 80 rounds, which I can't quantitatively explain). — fgrieu
– fgrieu ♦, Commented May 9, 2014 at 9:25
@fgrieu: Personally, I'd be happy if the Merkle–Damgård structure and Davies-Meyer construction were just briefly addressed (just saying which part of algorithm was which, and links as per your comment, I have them bookmarked for study). Identifying ch and maj is very useful to me. — Neil Slater
– Neil Slater, Commented May 9, 2014 at 9:36
@fgrieu Not exactly? Merkle-Damgard and Davies-Meyer only realistically come in for block-sizes of > n bits, where n is the length that will extend it beyond one "chunk". Apart from that, you can pretty much discard it and look at the compression function in isolation. As for 80 rounds, I haven't seen any explanation for it other than "Yep, looked OK to us", so I wonder whether there IS an explanation for that. — Rhyme
– Rhyme, Commented May 9, 2014 at 9:38
@Rhyme Instead of using above as an argument, you might as well have put it in your answer as a paragraph :) — Maarten Bodewes
– Maarten Bodewes ♦, Commented May 9, 2014 at 11:50

Stack Exchange Network

Understanding how SHA-512 achieves its design goals

1 Answer 1

Linked

Hot Network Questions

Understanding how SHA-512 achieves its design goals

1 Answer 1

Linked

Related

Hot Network Questions