2
$\begingroup$

1. Scenario

Suppose that we have a source that is generating one random value per, say, minute. So we have random value $x_1$ in $1$st minute, $x_2$ in $2$nd minute, $\ldots$, $x_n$ in the $n$th minute, and so forth.

The distribution of values $x_1, x_2, \ldots$ is not entirely uniform random, but follows the following rule: for any $i \ge 1$, $x_i = (y_i, y_{i+1})$, where, $y_i$ is the unique identifier of $x_i$, and $y_{i+1}$ is the unique identifier of upcoming $x_{i+1}$ that going to arrive at the next minute.

In other words, at any given $i$th minute, the current $x_i$, and the next $x_{i+}$ are known, but the one after $x_{i+2}$ is absolutely unknown, or random. Below is an example:

Minute 1: x_1 = (334234 , 234829129 ) Minute 2: x_2 = (234829129, 983220 ) Minute 3: x_3 = (983220 , 831231236347) ... Minute n: x_n = (643532754, 3534611 ) 

One way to hash those $n$ values, in order, is to hash each value as it arrives, e.g. $h_1 = f(x_1)$, then chain it with the next newly arriving one, e.g. $h_2 = f(x_2 \parallel h_1)$.

In other words, the hash of the ordered list of input, in the $n$th minute is defined recursively as $h_n = f(x_n \parallel h_{n-1})$, with the base case being $h_1 = f(x_1)$.

The advantage of this approach is that, for somehow who is running this process from the beginning, at every minute, both the run-time and space-time are in $O(1)$, as he can cache the hash from the previous minute.

The disadvantage of this approach is that, for someone that was not following the process, and wishes to verify if $h_n$ is indeed the hash of the entire sequence, he will have to start over from the beginning with $h_1$, and repeat the entire chain all the way until $h_n$. Effectively, this verification process will take $O(n)$ space and $O(n)$ time.

2. Wormhole

It would be nice if it is possible that, at every $n$ many hashed chains, we can discover a wormhole $w_n$ that has the following properties:

  • Once the $n$th hash, $h_n$, is legitimately calculated off $h_1$ by following the full recursion earlier, only then the wormhole $w_n$ becomes discovered. Otherwise, finding $w_n$ is practically impossible.
  • For a given $h'_n$ hash that is claimed to be $h_n$, the wormhole can shortcut the verification as follows:

$$ w_n(h_1, h'_n) = \begin{cases} 1 & \text{if $h'_n = h_n$}\\ 0 & \text{else}\\ \end{cases} $$

  • The asymptotic worst run-time as well as the asymptotic worst space for computing $w_n(h_1, h'_n)$ is not worse than $O(\log n)$. If it's possible to make it in $O(1)$, that's be even better of course.

Note. This is different than the pre-image attack of hashing functions. The difference being as follows:

  • Pre-image attacks allow the attacker to forge an arbitrary input that gives some desired target hash.

  • This wormhole $w_n$ does not allow forging any arbitrary input, but rather reveals a hidden shortcut path that works only for linking a specific input that allows to link $h_n$ back to $h_1$, and that the only way to discover such wormhole is by legitimately calculating $h_n$ first.

  • Maybe we can call such a wormhole to be a conditional pre-image attack that does not benefit the adversary.

3. Question

Are such verification wormholes known, or even possible?

$\endgroup$
11
  • 1
    $\begingroup$ I think there must be an extra input beside $n$, $h_1$ and $h'_n$ to the algorithm computing $w_n(h_1, h'_n)$. In particular, what it does should depend on $x_1$ to $x_n$, right? Therefore, why single out $h_1$, and what in the problem statement prevents from making $h_n$ that extra input, which allows a trivial implementation of $w_n(h_1, h'_n)$? Is it assumed $x_1$ to $x_n$ are implicit inputs to said algorithm? $\endgroup$ Commented Feb 14, 2022 at 16:57
  • $\begingroup$ Do the random values have to be genuinely random and outside of the control of the source, or do the values only need to be indistinguishable from random to an observer? $\endgroup$ Commented Feb 15, 2022 at 2:36
  • $\begingroup$ @fgrieu - Right, it depends on $x_1, x_2, \ldots$. However, I wonder, can we pass information about such dependence in the output hashes $h_1, h_2, \ldots$? In other words, can it be that $h_2 = f(x_2, h_1)$ is effectively passing related information in $x_1$ into $h_2$? Subsequently, as the chain goes on, can $h_n$ effectively have related information from $x_n, x_{n-1}, \ldots, x_1$, that's sufficient to create a verification wormhole $w_n(h_1, h'_n)$? $\endgroup$ Commented Feb 15, 2022 at 11:38
  • $\begingroup$ @fgrieu - As for your question about the problem statement preventing trivial solutions, if I understand you correctly, it's the requirement that the space complexity for the "wormhole user" must be constrained in $O(\log n)$. But, the "wormhole discoverer" must do the $O(n)$ process. $\endgroup$ Commented Feb 15, 2022 at 11:44
  • 1
    $\begingroup$ I assume the source can't be trusted to assert anything about the values? Because if the source is trusted, the source can just release a "checkpoint" every $n$th time, where a signed message containing the latest hash is announced. If the source can't be trusted to assert anything, how can the source be trusted to introduce a wormhole that attests to the correct value of the hash? $\endgroup$ Commented Feb 15, 2022 at 12:02

1 Answer 1

0
$\begingroup$

Edit: Clarifying answer by using a Merkle tree as example.

For a list of values $x_1,\ldots,x_n$, you can compute a Merkle tree with the root $h_n$. For a given $h_n'$ claimed to be $h_n$, you can shorten the verification by revealing the authentication path of length $log_2(n)-1$ between any known leaf hash $f(x_i)$ and the claimed $h_n'$.

By defining the wormhole $w_{i,n}$ as the authentication path between $f(x_i)$ and $h_n$, you can verify that $h_n' = h_n$ in $log_2(n)$ calls to $f$. As pseudo-code:

def verify(f_x_i, w, h_n): h_i = f_x_i for h_j in w: h_i = f(h_i + h_j) return h_i == h_n 

You can then call verify(f(x_i), w_i_n, h_n'). Specifically, to verify $h_n'$ does not require all previous hashing, only a subset of logarithmic size called the authentication path.


The drawback of using a Merkle tree here is that we have to assume that it's perfectly balanced, i.e. that $n = 2^k$ for some $k$, and appending requires rebuilding the tree, so no gain. Appending your $h_i$ to a Merkle Mountain Range instead, there is something similar to an authentication path to show that it lives under a set of peaks rather than under one root.

$\endgroup$
1
  • $\begingroup$ How would those peak hash digests link to $h_1$? $\endgroup$ Commented Feb 14, 2022 at 12:30

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.