0
$\begingroup$

Lets say we have an arbitrarily large stream of numbers, numbers ranging from 1 to 100. You know these numbers follow a known distribution, e.g exponential distribution.

Is it possible predict the next number in the stream knowing the PDF and the stream? What kind of approach could be taken?

$\endgroup$
0

1 Answer 1

2
$\begingroup$

Assuming the random numbers are being generated by an unknown random number generator that is practically indistinguishable from a "true" random source, then the answers are:

Is it possible predict the next number in the stream knowing the PDF and the stream?

No, this is either completely impossible even in theory, or a practical impossibility for most modern PRNGs. By practical impossibility I mean you would need to collect some huge number of samples e.g. $2^{128}$ and it would take longer than the lifetime of the universe and use up more energy than the suns output over its entire lifetime.

What kind of approach could be taken?

None, it is impossible, and AI won't help.

What if the random number generator was not so good . . .

Poorly designed random number generators can be predicted. Really old ones might fall to statistical analysis, and you could even train something like an LSTM to predict the next number in the sequence. This has nothing to do with the PDF, although you will need to reverse-engineer how the PDF is being produced in order to get at the raw PRNG outputs for analysis. Also, almost no modern computer language or library uses a random number function that is so simple.

For modern but insecure generators, such as the Mersenne Twister algorithm if you know or can reverse-engineer the algorithm being used, sometimes it may be possible to infer the current state of a random number generator from a certain number of samples. This is not really an AI issue though, but more closely related to cryptography. I believe the minimum number of samples to infer Mersenne Twister state is something like 640.

$\endgroup$
6
  • $\begingroup$ Thank you for the response, very helpful. Now, what if we shift the goal from predicting the values, to predicting the range in which the next value will fall? Assuming we have N bins, would decreasing the size of N make the problem more "approachable" ? Or would it still be impossible? @Neil Slater $\endgroup$ Commented Apr 23, 2024 at 23:25
  • $\begingroup$ If it's a true random number generator, then there is no way to "predict" the next number, the next bin, or next anything. By definition, if it's random there should be no information available about it until you see the number. $\endgroup$ Commented Apr 24, 2024 at 0:06
  • $\begingroup$ However, it's rare that you actually have random numbers. For example, Microsoft thought PhotoDNA hashes were random enough, until they got inverted with the help of neural networks. $\endgroup$ Commented Apr 24, 2024 at 0:08
  • $\begingroup$ If it is enough to predict it going in a certain bin it can be feasible for pseudorandom variables. I remember having a huge excel that showed the drop rate of an item in a chest in an rpg based on what value the cure magic had in battle or something like that... $\endgroup$ Commented Apr 24, 2024 at 6:06
  • 1
    $\begingroup$ @Emil: You can always predict a probability that matches the probability derived from the PDF. This is normal in machine learning, where the "residual" unknown factors impact prediction accuracy, but you can still often make predictions that are useful (effectively the ML learns the PDF, or at least the value to predict from it that leads to lowest error). It is also very different from "hacking" an underlying PRNG, which would have to be very low quality to achieve results using ML. $\endgroup$ Commented Apr 24, 2024 at 8:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.