Random factorized numbers

Question

Input

The code should take an integer \$n\$ between 1 and 1000.

Output

The code should output positive integers with \$n\$ bits. Accompanying each integer should be its full factorization. Each integer should be a uniformly random \$n\$ bit number.

Score

The score for your code will be the number of integers it outputs, with their factorizations, in ten seconds on my Ubuntu PC when \$n = 500\$.

Primality testing

If you need to test if a number is prime, you can use any method that is any of:

Provably always correct
Not proven to always be correct but for which there is no known example where it is incorrect
Provably has probability at most \$2^{-30}\$ of giving the wrong answer.

As always,this challenge is per language and I will keep a scoreboard for the answers given.

Is this possible?

The two best known methods are Bach's algorithm and a simpler one by Kalai.

Leaderboard

4 in Rust by mousetail
500 in Python 3.10 by Command Master
14406 in C++ by Command Master

This is probably impossible; you've got about a 1 in 100,000 chance of your 1000-bit number being the product of two 500 bit primes, which you won't be able to factorise before the heat death of the universe. — Neil
– Neil, Commented Oct 8, 2023 at 11:59
@Neil It turns out there are methods to do this. You don't do any factorization but rather construct the number with its factors — Simd
– Simd, Commented Oct 8, 2023 at 12:27
Am I allowed to do some preprocessing for all numbers together, or do I have to do it separately for each number? — Command Master
– Command Master, Commented Oct 9, 2023 at 3:42

Command Master · Accepted Answer · 2023-10-09 15:39:06Z

Python

Thanks to @AndersKaseorg for many improvements and bug fixes.

import math import random import time import sympy def gen_factor(N): while True: j = random.randint(1, N.bit_length() - 1) q = (1 << j) + random.randrange(1 << j) if q > N: continue Nt = N // q if random.random() >= math.log(q, N) * (((Nt + 1) >> 1) << j) / N: continue p, a = sympy.perfect_power(q) or (q, 1) if random.random() >= 1 / a: continue if sympy.isprime(p): return p, a, Nt def bach(N): if N <= 10 ** 5: x = random.randint(N//2 + 1, N) return x, sympy.factorint(x) while True: p, a, Nt = gen_factor(N) newx, factx = bach(Nt) newx *= p ** a if random.random() < math.log(N / 2, newx): factx[p] = factx.get(p, 0) + a return newx, factx def gen(n): return bach((1 << n) - 1) if __name__ == '__main__': t = time.perf_counter() a = [] while time.perf_counter() - t < 10: a.append(gen(500)) print('Generated', len(a), 'numbers in 10 seconds:') print(*a, sep='\n')

Attempt This Online!

This is a simple implementation of Bach's algorithm, just to get the challenge going. Make sure you have gmpy2 installed when running. Can generate around \$130\$ numbers on my computer.

@Simd could you give the amount of numbers it generates in 10 seconds on your computer? — Command Master
– Command Master, Commented Oct 8, 2023 at 14:39
It was quite a lot faster without pypy (but with gmpy2). I think sympy isprime is slow in pypy but fast with gmpy2. — Simd
– Simd, Commented Oct 8, 2023 at 14:48
AndersKaseorg Thanks, fixed! It also seems a bit faster. @Simd , can you give an updated score? — Command Master
– Command Master, Commented Oct 8, 2023 at 17:17
Swapping the random() * N < … rejection before the sympy.isprime(p) test seems to speed it up a bit. — Anders Kaseorg
– Anders Kaseorg, Commented Oct 8, 2023 at 19:23

mousetail · Accepted Answer · 2023-10-12 08:18:14Z

Rust + rand + bnum, score=4 on my PC

Basic implementation of Kalai's algorithm. Apparently it's not very good. The random rejection at the end rejects 99% of candidates so makes it take a lot longer.

use bnum::BUint; use rand::{distributions::uniform::UniformSampler, Rng}; #[inline] fn mod_exponentiate(mut base: BUint<16>, mut exponent: BUint<16>, modulus: BUint<16>) -> BUint<16> { if modulus == (1u64).into() { return 0u64.into(); } let mut result: BUint<16> = (1u64).into(); while !exponent.is_zero() { if exponent.digits()[0] & 1 == 1 { result = result * base % modulus } exponent >>= 1; base = (base * base) % modulus } return result; } #[inline] fn miller_test(candidate: BUint<16>, mut d: BUint<16>, witness: BUint<16>) -> bool { let mut x = mod_exponentiate(witness, d, candidate); if x == (1usize).into() || x == candidate - BUint::<16>::from(1u64) { return true; } while d != candidate - BUint::<16>::from(1u64) { x = (x * x) % candidate; d <<= 1; if x == (1u64).into() { return false; } if x == candidate - BUint::<16>::from(1u64) { return true; } } return false; } #[inline] fn prime_test(candidate: BUint<16>, iterations: usize, rng: &mut rand::rngs::ThreadRng) -> bool { if candidate == BUint::ONE { return false; } if candidate == BUint::TWO { return true; } if candidate == BUint::THREE { return true; } if candidate.digits()[0] & 1 == 0 { return false; } let mut d = candidate - BUint::<16>::from(1u64); if !d.is_zero() { d >>= d.trailing_zeros(); } let distribution = bnum::random::UniformInt::<BUint<16>>::new(BUint::TWO, &(candidate - BUint::TWO)); for _ in 0..iterations { let witness = distribution.sample(rng); if !miller_test(candidate, d, witness) { return false; } } return true; } #[inline] fn kalai_algorithm(max: BUint<16>, rng: &mut rand::rngs::ThreadRng) -> (Vec<BUint<16>>, BUint<16>) { loop { let mut candidates: Vec<BUint<16>> = vec![]; while candidates.last() != Some(&BUint::ONE) { candidates.push(rng.gen_range(BUint::ONE..*candidates.last().unwrap_or(&max))) } candidates.retain(|candidate| prime_test(*candidate, 8, rng)); if let Some(possible_output) = candidates.iter().cloned().reduce(|a, b| (a * b).min(max)) { if possible_output < max && rng.gen_range(BUint::ONE..max) < possible_output { break (candidates, possible_output); } } } } fn main() { let start_time = std::time::Instant::now(); let max = BUint::ONE << 500; let threads: Vec<_> = (0..8) .map(|thread_id| { std::thread::spawn(move || { let mut rng = rand::thread_rng(); loop { let (factors, product) = kalai_algorithm(max, &mut rng); if std::time::Instant::now() - std::time::Duration::from_secs(10) > start_time { break; } println!("{thread_id}\t{product:?} {factors:?}") } }) }) .collect(); threads.into_iter().for_each(|i| i.join().unwrap()); }

Run with:

opt-level=3 lto=true panic="abort"

\$\begingroup\$ Hey, it scores 4 more than a Charcoal implementation... \$\endgroup\$

Neil
– Neil

2023-10-12 12:14:55 +00:00
Commented Oct 12, 2023 at 12:14 — Neil
– Neil, Commented Oct 12, 2023 at 12:14

Command Master · Accepted Answer · 2023-10-18 08:28:04Z

C++ (gcc), score = 14406

#include <vector> #include <chrono> #include <cassert> #include <map> #include <random> #include <cmath> #include <cstdio> #include <iostream> #include <gmp.h> #include <gmpxx.h> std::minstd_rand rng; gmp_randstate_t gmp_rng; double prob[1011]; const int maxV = 510510; #define Vfactors {2, 3, 5, 7, 11, 13, 17} const int sqrtV = 715; int sieve[maxV+1]; int primeps[maxV]; int primecnt = 0; double logpq_sum[maxV]; int coprimes[92160]; std::uniform_int_distribution<> coprime_dist(0, 92159); int coprimecnt = 0; const double RAT = 510510. / 92160; const double LOGV = 18.96157969569485; void gen_factor(mpz_t N, int Nexp, double Nbase, mpz_t q, long int* qexp, double* qbase, mpz_t p, int* a, mpz_t Nq) { const double invN = 1 / Nbase; const double Nlog2 = Nexp + log2(Nbase); int I; for (I = 0; ; I++) { int i = I; double div = ldexp(invN*maxV, i-Nexp); if (div >= 1) break; prob[I] = (LOGV + i)*(1 + 1/(ldexp(maxV, i)-1)) + div*Nlog2; } int directI = I; const double finvN = ldexp(invN, -Nexp); prob[I++] = RAT * (logpq_sum[primecnt-1] + finvN * Nlog2 * (primecnt-1)); // small prime powers std::discrete_distribution<> distribution(std::begin(prob), std::begin(prob)+I); while (1) { int J = distribution(rng); if (J < directI) { const int i = J; mpz_urandomb(p, gmp_rng, i); mpz_setbit(p, i); mpz_mul_ui(p, p, maxV); mpz_add_ui(p, p, coprimes[coprime_dist(rng)]); if (mpz_cmp(p, N) > 0) continue; long pexp; double pbase = mpz_get_d_2exp(&pexp, p); double p_d = mpz_get_d(p); *a = 0; double logp = pexp + log2(pbase); double Nlogp = Nlog2 / logp; double choice_probability = (LOGV + i) / logp + Nlogp * finvN * (ldexp(maxV, i) - 1); #ifdef DEBUG std::cout << choice_probability << '\n'; #endif double f = std::uniform_real_distribution<>(0, choice_probability)(rng); while (*a <= Nlogp && f >= 0) { ++*a; f -= (ldexp(maxV, i) - 1) * pow(p_d, -*a) + (ldexp(maxV, i) - 1)*finvN; } if (*a > Nlogp) continue; if (!mpz_probab_prime_p(p, 15)) continue; mpz_pow_ui(q, p, *a); } else { // small prime double f = std::uniform_real_distribution<>(0, logpq_sum[primecnt-1] + finvN * Nlog2 * (primecnt-1) )(rng); int l = 0; int r = primecnt-2; int ans = primecnt-1; while (l <= r) { int m = (l + r) / 2; double v = logpq_sum[m] + finvN * Nlog2 * m; if (v >= f) { ans = m; r = m-1; } else { l = m+1; } } int pv = primeps[ans]; mpz_set_ui(p, pv);   double Nlogq = Nlog2 / log2(pv); f = std::uniform_real_distribution<>(0, 1 + Nlogq * (pv-1) * finvN )(rng); *a = 0; while (*a <= Nlogq && f >= 0) { ++*a; f -= (pv-1) * (pow(pv, -*a) + finvN); } if (*a > Nlogq) continue; mpz_ui_pow_ui(q, pv, *a); } *qbase = mpz_get_d_2exp(qexp, q); mpz_fdiv_q(Nq, N, q); int is_odd = mpz_odd_p(Nq); if (is_odd) mpz_add_ui(Nq, Nq, 1); double Nqd = mpz_get_d(Nq); if (!std::bernoulli_distribution(Nqd / (ldexp(Nbase / *qbase, Nexp - *qexp) + 1))(rng)) continue; if (is_odd) mpz_sub_ui(Nq, Nq, 1); return; } } void bach(mpz_t v, mpz_t target, std::vector<mpz_class>& factors) { if (mpz_cmp_ui(v, 1e7) < 0) { unsigned int vI = mpz_get_ui(v); int x = std::uniform_int_distribution<>(vI/2+1, vI)(rng); mpz_set_ui(target, x); factors.clear(); for (int d = 2; d*d <= x; d++) { while (x % d == 0) { factors.emplace_back(d); x /= d; } } if (x != 1) { factors.emplace_back(x); } return; } mpz_t q, p, Nt; mpz_inits(q, p, Nt, 0); long int Nexp; double Nbase = mpz_get_d_2exp(&Nexp, v); while (1) { int a; long int qexp; double qbase; gen_factor(v, Nexp, Nbase, q, &qexp, &qbase, p, &a, Nt); bach(Nt, target, factors); long int targetexp; double targetbase = mpz_get_d_2exp(&targetexp, target); double logv = (log2(Nbase) + Nexp - 1)/(log2(qbase) + qexp + log2(targetbase) + targetexp); if (std::bernoulli_distribution(logv)(rng)) { for (int i = 0; i < a; i++) factors.emplace_back(p); mpz_mul(target, target, q); mpz_clears(q, p, Nt, 0); return; } } } void gen(int n, mpz_t target, std::vector<mpz_class>& factors) { mpz_t v; mpz_init2(v, n); mpz_setbit(v, n); mpz_sub_ui(v, v, 1); bach(v, target, factors); } int main() { rng.seed(time(0)); gmp_randinit_lc_2exp_size(gmp_rng, 64); time_t end = time(0) + 10; sieve[0] = sieve[1] = 1; for (int v : Vfactors) { for (int i = v; i <= maxV; i += v) sieve[i] = 1; } for (int i = 1; i < maxV; i++) if (!sieve[i]) coprimes[coprimecnt++] = i; for (int v : Vfactors) sieve[v] = 0; double logpq_sum_v = 0; for (int v = 2; v <= maxV; v++) { if (!sieve[v]) { if (v <= sqrtV) for (int i = v*v; i <= maxV; i += v) sieve[i] = 1; logpq_sum[primecnt] = logpq_sum_v += log2(v) / (v-1); primeps[primecnt++] = v; } } int i; int tsz = 0; int primecnt = 0; for (i = 0; time(0) <= end; i++) { mpz_t target; mpz_init(target); std::vector<mpz_class> factors; gen(500, target, factors); tsz += factors.size(); if (factors.size() == 1) primecnt++; #ifdef OUTPUT mpz_out_str(stdout, 10, target); printf(" = "); for (int j = 0; j < factors.size(); j++) { if (j) printf(" * "); std::cout << factors[j]; } printf("\n"); #endif mpz_clear(target); } std::cout << "generated " << i << " numbers\n"; std::cout << "prime probability " << primecnt/double(i) << ", expected apprx. " << 0.0028879 /* (Li(2^500) - Li(2^499)) / 2^499 */ << '\n'; std::cout << "average number of prime divisors " << tsz/double(i) << " expected apprx. " << log(500 * M_LN2) + 1.0345 /* B_2, see Wikipedia */ << '\n';   }

Try it online!

Can output around 15,000 numbers on my computer. Compile with -lgmp -lgmpxx -O3. Please make sure there aren't other programs active while running this, because I've noticed that on my computer it changed the number by almost 1,000, even with just Firefox.

Explanation

This is an implemention of Bach's algorithm, although with a lot of optimizations.

The main function is gen_factor, which given \$n\$ returns \$q = p^\alpha\$ with probability \$\log(p) \#(\frac{n}{2q}, \frac nq] = \log(p) \lfloor\frac{\frac{N}{q} + 1}2\rfloor\$, where \$\#(a, b]\$ is the number of integers in that range.

We approximate that the floor isn't important, and then use rejection sampling to get back the appropriate distribution.

After multiplying by (the constant) \$\frac2N\$, we get \$\log(p)(\frac1q+\frac1N)\$. Now we group the terms by \$p\$, and for a given \$p\$ we have its probability being $$\log(p) \sum_{\alpha=1}^{\lfloor \log_p(N) \rfloor} (\frac1{p^\alpha}+\frac1N) = \log(p) (\frac1{p-1} - \frac1{p^{\lfloor \log_p(N) \rfloor}} + \frac{\lfloor \log_p(N) \rfloor}N) \leq \frac{\log(p)}{p-1} + \frac{\log(N)}{N}$$

We split the range to \$[2, 510510)\$, and \$[510510 \cdot2^i, 510510 \cdot2^{i+1})\$ for all \$i\$. For the first range, we precalculate stuff to get numbers with the correct probability. For each of the \$[510510 \cdot2^i, 510510 \cdot2^{i+1})\$ ranges we notice that the probability is decreasing, so we can just take the probability for \$p = 510510\cdot 2^i\$ and use rejection sampling. To lower the amount of numbers we have to generate, we require the numbers to be coprime to \$510510 = 2\cdot3\cdot5\cdot7\cdot11\cdot13\cdot17\$, so we need to multiply the probability by \$\frac{510510}{\varphi(510510)}\$ for the other segment.

@Simd Can you update how many numbers it generates on your computer? — Command Master
– Command Master, Commented Oct 11, 2023 at 6:28
Another large improvement! I now get "unused variable ‘sz’ " in case you hadn't noticed. — Simd
– Simd, Commented Oct 11, 2023 at 8:25
@Simd oh, the order of operations there is wrong (so the distribution is slightly off), I'll fix that in a few hours. I should really compile it with -Wall — Command Master
– Command Master, Commented Oct 14, 2023 at 11:47
@Simd now a significant part of the rejections are due to the primality tests failing, and I'm not sure what can be done about that. A slightly tighter envelope is possible, I might try that later. — Command Master
– Command Master, Commented Oct 16, 2023 at 3:09
@Simd I tried some stuff and it wasn't all that effective. Maybe you can get a few hundred more numbers, but not much more — Command Master
– Command Master, Commented Oct 18, 2023 at 8:47

Neil · Accepted Answer · 2023-10-10 00:32:03Z

Charcoal, 66 bytes, score 0 (usually)

≔Ｘ²ＮθＷ¹«≔⟦⊕‽⊖θ⟧υＷ⊖⌊υ⊞υ⊕‽⌊υ≔Φυ▷”8±cGＮºb6﹪←”κυ¿∧‹Πυθ‹‽θΠυ«⟦Ｉ⊞ＯυΠυ⟧Ｄ⎚

Try it online! Link is to verbose version of code with extra check because the version of Charcoal on TIO returns None for the product of an empty list and I can't use the version of Charcoal on ATO for multiple reasons. Note that you need to interrupt (^C if running locally or TIO's button) the code manually after ten seconds otherwise it will never print anything. Explanation: Uses Kalai's inefficient algorithm.

≔Ｘ²Ｎθ

Get 2ⁿ.

Ｗ¹«

Repeat forever.

≔⟦⊕‽⊖θ⟧υ

Start with a random integer between 0 and 2ⁿ exclusive.

Ｗ⊖⌊υ

Until a 1 is generated...

⊞υ⊕‽⌊υ

... push a random integer between 1 and the previous integer to the list.

≔Φυ▷”8±cGＮºb6﹪←”κυ

Filter out all non-primes from the list.

¿∧‹Πυθ‹‽θΠυ«

If the product is less than 2ⁿ and passes a random chance, then...

⟦Ｉ⊞ＯυΠυ⟧Ｄ⎚

... output the primes and their product.

83 bytes for a version that times itself:

≔Ｘ²Ｎθ≔▷clock⟦⟧ηＷ‹⁻▷clock⟦⟧ηχ«≔⟦⊕‽⊖θ⟧υＷ⊖⌊υ⊞υ⊕‽⌊υ≔Φυ▷”8±cGＮºb6﹪←”κυ¿∧‹Πυθ‹‽θΠυ⟦Ｉ⊞ＯυΠυ

Try it online! Link is to verbose version of code with extra check because the version of Charcoal on TIO returns None for the product of an empty list and I can't use the version of Charcoal on ATO because it doesn't have access to sympy.isprime. Note that the timing is approximate and might generate an extra result which actually finished outside the desired time.

64 bytes for a version that finds one factorisation and exits:

⊞υＸ²Ｎ≔⊖⌈υθＷ∨›Πυθ›‽θΠυ«≔⟦⊕‽θ⟧υＷ⊖⌊υ⊞υ⊕‽⌊υ≔Φυ▷”8±cGＮºb6﹪←”κυ»Ｉ⊞ＯυΠυ

Try it online! Link is to verbose version of code. Still usually crashes out because it can't calculate the product of an empty list, and even when it does find a result, it usually takes more than 10 seconds, but just in case, here's a bash script that restarts the program on error and times out after 10 seconds: Try it online!

Nicolas · Accepted Answer · 2023-10-15 04:40:11Z

Jelly (12 bytes) Score: 0

Edit: old code was wrong

2*³XȮÆ!Ṅ ç1¿

New code: 2*³XȮÆEṄ >! ç1¿

2*³..... 2^input ...X.... Pick a random number in [0, 2^input) ....Ȯ... Print the random number .....ÆE. Factorize it .......Ṅ Print the factors and go to line ç1¿..... Loop the function above forever

Old code:

Jelly beginner, the code can likely be shortened. It generated 141'808 random numbers between [0, 2^500) in 10 seconds on my MacBook Pro M2, the generated filesize is 68Mb. Try the code here: https://tio.run/##ASAA3/9qZWxsef//MirCs1jIrsOGIeG5hArDpzHCv////zUwMA

New code has a score of 0.

Testing code:

> ./jelly fu prime.jelly 500 > primelist.txt & pid=$!; sleep 10; kill $pid > wc -l primelist.txt

I don't think this is correct. You can't factorize a 500 bit number quickly. — Simd
– Simd, Commented Oct 14, 2023 at 20:55
According to the Jelly codepage, Æ starts a two byte arithmetic monad, so you're actually doing Æ! which converts the number to factorial base. — Command Master
– Command Master, Commented Oct 15, 2023 at 2:04

Stack Exchange Network

Random factorized numbers

Input

Output

Score

Primality testing

Is this possible?

Leaderboard

5 Answers 5

Python

Rust + rand + bnum, score=4 on my PC

C++ (gcc), score = 14406

Explanation

Charcoal, 66 bytes, score 0 (usually)

Jelly (12 bytes) Score: 0

Hot Network Questions

Input

Output

Score

Primality testing

Is this possible?

Leaderboard

5 Answers 5

Rust + rand + bnum, score=4 on my PC

C++ (gcc), score = 14406

Explanation

Charcoal, 66 bytes, score 0 (usually)

Jelly (12 bytes) Score: 0

Related