The Wikipedia entry on SHA-2 contains a usable pseudocode recipe. In the hope of some deeper understanding, I implemented SHA-256 and SHA-512 from it. This was helpful, but I still don't think I have joined together my understanding of how the hashing works, and how it achieves its goals.
In short, I can see some non-linear bit-twiddling and likely chaotic behaviour from the components that SHA-2 strings together, but I'm missing how the whole algorithm results in provably achieving the avalanche effect or anything similar.
In discussions here, posters tend to treat the hash output as essentially a random map from any set of input messages whether they have any correlations or not (e.g. they could be random messages to hash, or they could be from a simple series such as integers from 1 to 1000). I'd like to know whether and how this property of SHA-2 is proven or verified.
Here is my understanding so far:
The initial hash data and round constants, derived from square roots and cube roots of prime numbers, are not relevant. The numbers are chosen this way to demonstrate lack of bias (or hidden backdoors) by the designer. You could equally use the binary expansion of $\pi$ sequentially in each constant.
The main design purpose I see for the hash and round constants is that they will push the non-linear repeated rotations into a more chaotic mode when they are added in; and in combination they basically ensure there is no single input word value that has simple short loops when repeatedly calculating e.g. Sum $s0 = (a \ggg 28) \oplus (a \ggg 34) \oplus (a \ggg 39).$ in each round.
The sums in the round and used to initialise the workspace seem have the effect of "spreading" message bit values throughout each word (either in the workspace, or one of the hash sub-components), and as far as I can see the specific values of amount of rotation (28, 34, 39 in the example) are not highly important, other than each value is different between sums/sigmas and they do not form short loops modulo either each other or with the word size.
I can see that some important pieces of feedback in addition in the rounds (e.g. to create
temp1the value ofhis used - if it were not used somehow for eithertemp1ortemp2, its value would become irrelevant). But in general I have no clue why the results from the sums are combined in the specific way given. For instance, why isenegated as part of creatingch?
I do not expect to go from this level of understanding to full comprehension in one step. I'm also not 100% sure about any of the above. In general, I can see how SHA-2 works its magic in a hand-waving way - I might say "it is a complex system built using a few base algorithms with chaotic behaviour interacting". But I am stuck getting to the next level.
How do I find out why the specific recipes for e.g. ch and maj are used (the names chosen in the Wikipedia article imply they have meaning in the design)?
What form does the proof that SHA-2 does what it intends to do take?