Uniform random number generation on an arbitrary interval

Question

Supposing that one has reliably random data of uniform distribution to use as an input, how can an integer in the range $[0, k-1]$ be selected at random without bias and in constant-time with respect to the size of $k$?

You want this algorithm: arxiv.org/abs/1805.10941. Notes by someone trying to understand the algorithm: sts10.github.io/2020/10/10/…. Note that this method uses rejection sampling, so it's not strictly constant-time, but it does not leak timing data. — swineone
– swineone, Commented May 22 at 23:14
See discussion in a cryptographic context here: cic.iacr.org/p/1/2/14/pdf — swineone
– swineone, Commented May 24 at 3:21

poncho · Accepted Answer · 2025-05-22 17:40:35Z

I'll assume that, by 'random data of uniform distribution', you mean a source of uniformly and independently distributed random bits. The below argument can be modified to work if the source isn't binary; however binary is the usual case.

Obviously, unless $k$ happens to be a power of two, any constant time algorithm (that is, has an upper bound on the number of random bits it can sample) can't be precisely uniform - there will always be a bias.

However, in practical terms, this bias can be made sufficiently small for any practical purpose.

One approach is to sample $\lceil \log_2 k \rceil + 128$ random bits, interpret that as an integer $z$ and compute $z \bmod k$; that's your random value. This can be done in constant time, and the maximum bias between possible values is $< 2^{-128}$; to detect such a bias, you would need to examine $2^{256}$ such values, and we'll never generate that many. Hence, this output is effectively uniformly distributed, even if it is theoretically not.

@bdegnan: that's to reduce any potential bias to undetectable. With $k$ extra bits, the number of outputs needed to detect the bias is around $2^{2k}$; with $k=128$, that is an unfeasibly huge amount. With $k=64$, it's almost always sufficient; I decided to go with massive overkill when writing this — poncho
– poncho ♦, Commented May 22 at 19:57
@Melab: the proof is quite simple: let us call the bound on sampled random bits $a$. Any deterministic algorithm whose output relies on a maximum $a$ random bits would necessarily have output probability for any specific output be an integer multiple of $2^{-a}$. For the output of the algorithm to be precisely uniform, then each possible output must occur with probability $k^{-1}$. If $k$ is not a power of two, then $k^{-1}$ is not an integer multiple of $2^{-a}$, for any $a$, hence the probability of that output must be either higher or lower than precisely uniform. — poncho
– poncho ♦, Commented May 23 at 11:38
I never understood why even competent people like you often use in this situation the more complicated modular reduction mod $k$ instead of simply multiplying by $k$ and cutting off the lowest 128 bits. — j.p.
– j.p., Commented May 24 at 7:48

Marc Ilunga · Accepted Answer · 2025-05-23 10:27:40Z

One possible interpretation is that we have a source that produces a fixed-length uniform random bit string (say, a 256 long bitstring), and we want to generate a random number in the range $[0,k)$. We could even have $k \geq 2^{256}$ since at this point we only expect computational indistinguishability.

Construction based on a PRG: Let $s$ be a 256-bit uniform random secret, $G$ be a PRG that bitstrings of length $l$; we sample random numbers in range as follows $r = int(G(s)) \mod k$. The security of this scheme against a computationally limited attacker can be summarized as follows:

$G(s)$ can be replaced with a random bitstring $x$ by the security of $G$
$X = int(x)$ is uniformly distributed in $[0, 2^l)$. If $Y$ is the uniform distribution on $[0, k)$, let $Z = X \mod k$; then the statistical distance $\Delta(Y; Z)$ is upperbounded by $k / 2^l$ (for $k \leq 2^l)$. The bound is loose, and something tighter can be computed, but it is good enough for the sake of choosing secure instantiations.

In other words, the probability that a computationally bounded attacker distinguishes $Z$ from $Y$ is upper bounded by $\Pr[\text{Break} G] + \Delta(Y; X) $. Assuming that $k \leq 2^n$, then the statistical distance is $1/2^{l-n}$. So if we want to keep the statistical distance to a negligible amount, we would need, for example, $l = n + 128$.

Sample several times: The construction above can be generalized so that the same seed generates a few random numbers. For this, we replace $G$ by a PRF $F$ that outputs bitstrings of length $l$. To generate the next value in range $[0, k)$, we use a nonce $n$ and compute $int(F(seed, n)) \mod k$. The justification of security is essentially the same as in the PRG case.

@Melab, is there anything in particular that I can help clarify? In summary, this is essentially @ poncho's answer with a twist. Namely, we approximate sampling a long random string by sampling a short random string, stretching it, and then doing what @ poncho said. — Marc Ilunga
– Marc Ilunga, Commented May 23 at 11:36

Stack Exchange Network

Uniform random number generation on an arbitrary interval

2 Answers 2

Hot Network Questions

Uniform random number generation on an arbitrary interval

2 Answers 2

Related

Hot Network Questions