Skip to main content
1 of 8
fgrieu
  • 150.9k
  • 13
  • 327
  • 631

I'll consider only a non-adversarial model for the requirement of a low collision probability (that is, we are considering naturally-occurring strings only).

My proposed $h(s)$ is $256$-bit ($32$ bytes), among which the first $64$ bits $l(s)$ are the number of bits in $s$ (truncated to the lower $64$ bits), and the $192$ last bits are $m(s)=2^s\bmod p$, where $p$ is a fixed $192$-bit prime such that $(p-1)/2$ is prime and $2^{(p-1)/2}\bmod p=p-1$. Conversions from bitstring to integer and back are big-endian (most significant bit first).

The function $m$ is such that $m(s)=m(s\bmod(p-1))=2^{s\bmod(p-1)}\bmod p$, which allows computation of that hash of a string in time about proportional to its length.

We define $g(h_1,h_2)$ as follows:

  • from $h_1$ and $h_2$ we get the $l_1=\lfloor h_1/2^{192}\rfloor$ and $l_2=\lfloor h_2/2^{192}\rfloor$, and build the first $64$ bits of $g(h_1,h_2)$ as $$l_1+l_2\bmod2^{64}$$
  • from $h_1$ and $h_2$ we get the $m_1=h_1\bmod2^{192}$ and $m_2=h_2\bmod2^{192}$, and build the last $192$ bits of $g(h_1,h_2)$ as $$m_1^{2^{l_2}\bmod(p-1)}\cdot m_2\bmod p$$

The desired $g(h(s_1),h(s_2))=h(s_1\|s_2)$ holds, and its computation is fast, independently of the length of the original strings.

A suitable choice of $p$ is $\lceil325\cdot\pi\cdot2^{182}\rceil+78356$, that is in hexadecimal ff41208fd469fe8cfdcfdb1b1a977095890fed10d7b6f8a3 (Note: $325$ was selected so that the leftmost byte of $p$ is FF, $78356$ is the lowest integer so that $p$ matches the other requirements).

Examples (with this choice of $p$)
The hash of the empty string is
0000000000000000000000000000000000000000000000000000000000000001
The hash of the one-byte string 00 is
0000000000000008000000000000000000000000000000000000000000000001
The hash of the one-byte string 4c is
0000000000000008800000000000000000000000000010000000000000000000
The hash of the one-byte string bf is
0000000000000008800000000000000000000000000000000000000000000000
The hash of the one-byte string c0 is
000000000000000800bedf702b960173023024e4e5688f6a76f012ef2849075d
The hash of the one-byte string c1 is
0000000000000008017dbee0572c02e6046049c9cad11ed4ede025de50920eba
The hash of the two-byte string c0c1 is
0000000000000010f4cf3a34057419042ee1a3d61690616770b8532f43255211

It is easy to find a collision, but it won't occur by accident. For example, the $24$-byte strings
000000000000000000000000000000000000000000000001 and
ff41208fd469fe8cfdcfdb1b1a977095890fed10d7b6f8a3 both hash to
00000000000000C0000000000000000000000000000000000000000000000002.

fgrieu
  • 150.9k
  • 13
  • 327
  • 631