I've been interested in studying information content in the context of algorithms, especially PRNGs. Originally inspired by the entropy extracting properties of the XOR gate, I wanted to simulate arithmetic operators on bitstrings to look at how information content changes in PRNGs as they run for cryptanalytical purposes.
Originally the aim was to start by implementing arithmetic operator circuitry using their truth table probabilities and simulate, for example, an LCG (which is known to be sub-par, but it's a starting point). The vision was to consider a perfectly random bitstring seed value as input, i.e. each bit $b_n$ is independent and has a probability of being true $p(b_n = 1) = 0.5$, and look at what the probabilities are after a pass through the LCG*.
I'm already stuggling to interpret my results with simple applications of the addition operator. I went into this expecting that addition of a "probabilistic" number with a "concrete" (definite) number would preserve information somehow in the probabilities across the string.
Let (little-endian) bitstring $A = a_0 \Vert a_1$ with probabilities of truth $a = \{0.5, 0.5\}$. I refer to $A$ as a "probabilistic bitstring" as not all probabilities are zero or one. It could realise values 0, 1, 2 or 3 if realisations are interpreted as binary little-endian integers with $A = \sum_m a_m \times 2^m$.
Let bitstring $B = b_0$ be a single bit with probability $b_0 = 1$. Thus, $B$ is a "concrete bitstring" that always realises value 1.
If we let $C = A + B$ (using a full adder circuit), the resultant probabilistic bitstring $c = \{0.5, 0.5, 0.25\}$.
This result is entirely expected. $A$ can realise a value up to the binary integer value 3, and we always add 1, so it is possible for the third bit to be set, which only happens when $p(a_0 \cap a_1) = 0.5 \times 0.5 = 0.25$.
What I am struggling to understand though, is how to quantify the information content before and after. A definite value obviously has no entropy, so entropy $E(B) = -b_0 \log_2 b_0 = 0$. Entropy of the probabilistic bitstring $A$ is $E(A) = -\sum_m a_m \log_2 a_m = 1$ and for $C$ it is $E(C) = -\sum_\ell c_\ell \log_2 c_\ell = 1.5$.
While it is intuitive that adding two numbers together might increase its length (i.e. the most significant bit position with non-zero probability), conceptually, I wouldn't expect the information content to change. It is clear, though, that a summation over more terms is taking place, and if I divide by the length to quantify the information "density" then in this case I get an expected result: both are 0.5.
However, extending $A$ further does not continue this pattern. For example:
| $a$ | $b$ | $c$ | Entropy | Density |
|---|---|---|---|---|
| {0.5, 0.5} | {1} | {0.5, 0.5, 0.25} | 1.5 | 0.5 |
| {0.5, 0.5, 0.5} | {1} | {0.5, 0.5, 0.5, 0.125} | 1.875 | 0.46875 |
| {0.5, 0.5, 0.5, 0.5} | {1} | {0.5, 0.5, 0.5, 0.5, 0.0625} | 2.25 | 0.45 |
| {0.5, 0.5, 0.5, 0.5, 0.5} | {1} | {0.5, 0.5, 0.5, 0.5, 0.5, 0.0312} | 2.656250 | 0.442708 |
| {0.5, 0.5, 0.5, 0.5, 0.5, 0.5} | {1} | {0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.0156} | 3.093750 | 0.441964 |
The information content appears to be perfectly linear as the length of $A$ (where all $a_m = 0.5$) increases, but plotting information density is below:
Note that, as the length of $A$ increases significantly beyond this plot, it appears to converge back up to 0.5.
Please can someone demystify what's going on here? Why the dip in the graph? Am I fundamentally misunderstanding what information means or how to quantify it?
I assume addition preserves information as, conceptually, I can understand that a random number with a constant added to it is still just as unpredictable, it's just that the minimum and maximum value it can hold have changed. Is it a faulty assumption that a full adder circuit preserves information like the addition of random numbers with constants?
Footnote
*I realise this may be a flawed way to look at the functioning of information 'recycling' in a PRNG. Even if the seed input is truly random, once it's been through one round, it is recycled back in. I'm not sure if you could argue for the independence of the bits anymore. So even if the information content after multiple rounds is purely mathematically relevant rather than metaphysically, it's fun to think about.
$C$ versus length of