Supposing that one has reliably random data of uniform distribution to use as an input, how can an integer in the range $[0, k-1]$ be selected at random without bias and in constant-time with respect to the size of $k$?
- 1$\begingroup$ Can we have a little bit more? Uniformly distributed whats? $\endgroup$Paul Uszak– Paul Uszak2025-05-22 19:57:49 +00:00Commented May 22 at 19:57
- $\begingroup$ You want this algorithm: arxiv.org/abs/1805.10941. Notes by someone trying to understand the algorithm: sts10.github.io/2020/10/10/…. Note that this method uses rejection sampling, so it's not strictly constant-time, but it does not leak timing data. $\endgroup$swineone– swineone2025-05-22 23:14:27 +00:00Commented May 22 at 23:14
- $\begingroup$ @swineone How does it thwart timing attacks? $\endgroup$Melab– Melab2025-05-23 11:36:22 +00:00Commented May 23 at 11:36
- $\begingroup$ See discussion in a cryptographic context here: cic.iacr.org/p/1/2/14/pdf $\endgroup$swineone– swineone2025-05-24 03:21:33 +00:00Commented May 24 at 3:21
2 Answers
I'll assume that, by 'random data of uniform distribution', you mean a source of uniformly and independently distributed random bits. The below argument can be modified to work if the source isn't binary; however binary is the usual case.
Obviously, unless $k$ happens to be a power of two, any constant time algorithm (that is, has an upper bound on the number of random bits it can sample) can't be precisely uniform - there will always be a bias.
However, in practical terms, this bias can be made sufficiently small for any practical purpose.
One approach is to sample $\lceil \log_2 k \rceil + 128$ random bits, interpret that as an integer $z$ and compute $z \bmod k$; that's your random value. This can be done in constant time, and the maximum bias between possible values is $< 2^{-128}$; to detect such a bias, you would need to examine $2^{256}$ such values, and we'll never generate that many. Hence, this output is effectively uniformly distributed, even if it is theoretically not.
- $\begingroup$ Where does that +128 come from? $\endgroup$b degnan– b degnan2025-05-22 19:53:50 +00:00Commented May 22 at 19:53
- 3$\begingroup$ @bdegnan: that's to reduce any potential bias to undetectable. With $k$ extra bits, the number of outputs needed to detect the bias is around $2^{2k}$; with $k=128$, that is an unfeasibly huge amount. With $k=64$, it's almost always sufficient; I decided to go with massive overkill when writing this $\endgroup$2025-05-22 19:57:38 +00:00Commented May 22 at 19:57
- $\begingroup$ ah, see. I appreciate the explanation. $\endgroup$b degnan– b degnan2025-05-22 23:32:12 +00:00Commented May 22 at 23:32
- 3$\begingroup$ @Melab: the proof is quite simple: let us call the bound on sampled random bits $a$. Any deterministic algorithm whose output relies on a maximum $a$ random bits would necessarily have output probability for any specific output be an integer multiple of $2^{-a}$. For the output of the algorithm to be precisely uniform, then each possible output must occur with probability $k^{-1}$. If $k$ is not a power of two, then $k^{-1}$ is not an integer multiple of $2^{-a}$, for any $a$, hence the probability of that output must be either higher or lower than precisely uniform. $\endgroup$2025-05-23 11:38:06 +00:00Commented May 23 at 11:38
- 1$\begingroup$ I never understood why even competent people like you often use in this situation the more complicated modular reduction mod $k$ instead of simply multiplying by $k$ and cutting off the lowest 128 bits. $\endgroup$j.p.– j.p.2025-05-24 07:48:35 +00:00Commented May 24 at 7:48
One possible interpretation is that we have a source that produces a fixed-length uniform random bit string (say, a 256 long bitstring), and we want to generate a random number in the range $[0,k)$. We could even have $k \geq 2^{256}$ since at this point we only expect computational indistinguishability.
Construction based on a PRG: Let $s$ be a 256-bit uniform random secret, $G$ be a PRG that bitstrings of length $l$; we sample random numbers in range as follows $r = int(G(s)) \mod k$. The security of this scheme against a computationally limited attacker can be summarized as follows:
- $G(s)$ can be replaced with a random bitstring $x$ by the security of $G$
- $X = int(x)$ is uniformly distributed in $[0, 2^l)$. If $Y$ is the uniform distribution on $[0, k)$, let $Z = X \mod k$; then the statistical distance $\Delta(Y; Z)$ is upperbounded by $k / 2^l$ (for $k \leq 2^l)$. The bound is loose, and something tighter can be computed, but it is good enough for the sake of choosing secure instantiations.
In other words, the probability that a computationally bounded attacker distinguishes $Z$ from $Y$ is upper bounded by $\Pr[\text{Break} G] + \Delta(Y; X) $. Assuming that $k \leq 2^n$, then the statistical distance is $1/2^{l-n}$. So if we want to keep the statistical distance to a negligible amount, we would need, for example, $l = n + 128$.
Sample several times: The construction above can be generalized so that the same seed generates a few random numbers. For this, we replace $G$ by a PRF $F$ that outputs bitstrings of length $l$. To generate the next value in range $[0, k)$, we use a nonce $n$ and compute $int(F(seed, n)) \mod k$. The justification of security is essentially the same as in the PRG case.
- $\begingroup$ I don't understand a bit of this. $\endgroup$Melab– Melab2025-05-23 11:28:48 +00:00Commented May 23 at 11:28
- 1$\begingroup$ @Melab, is there anything in particular that I can help clarify? In summary, this is essentially @ poncho's answer with a twist. Namely, we approximate sampling a long random string by sampling a short random string, stretching it, and then doing what @ poncho said. $\endgroup$Marc Ilunga– Marc Ilunga2025-05-23 11:36:47 +00:00Commented May 23 at 11:36