In order to estimate entropy of a sequence, I came up with something like this:
float log2_dirty(float x) { return std::bitcast<float>((std::bitcast<int>(x) >> 4) + 0x3D800000)) - 15.0f; } This is loosely based on the identity of log2(1+x) ~= x which over the interval of x=[0..1] has the maximum error of about 8%.
Also for log2(2^e * (1+a)) ~= e + a the idea now is to move parts of the exponent to a linear part, then adding a bias some range of the original value x reasonably closely matches the true value as in
appr std::log2 x = 2.0 1 1 x = 3.0 1.5 1.58496 x = 4.0 2 2 x = 5.0 2.25 2.32193 x = 6.0 2.5 2.58496 ... x = 29.0 4.8125 4.85798 x = 30.0 4.875 4.90689 x = 31.0 4.9375 4.9542 Other parametrisation would be x >> 5 + 0x40000000 in integer domain, then -31.0f in fp domain.
The ranges where this works is between 2 <= x < 32.f or 2.0 <= x < 64.f, but I'm wondering if and how I could include also the range 1..2 or even 1/16 <= x < 32 with as little tinkering as possible.
The best I was thinking was return log2_dirty(x * 16) - 4.0f for 1/16 <= x < 2 range, but maybe something better is available (also targeting SIMD and branchless programming).
One additional target is to compute n log2 n, as that is used for entropy computation (And here I'm actually not using p log2 p, p<=1.0 but positive values n which if necessary I can clamp to 1.0 and which in my case is also limited to x < 31).
- Entropy is traditionally computed as
- Sigma log2(p_i)*p_i, where p_i = n_i / Sigma(n_i), but that can be also expressed asSigma log2(n_i)*n_i - Sigma(n_i) * log2(Sigma(n_i)), allowing to reuse the logarithms of overlapping sequences. The traditional method has the nice property of0<=p_i<=1-- that is the input is automatically limited, but it requires not only division, but also if we want to search over a linear signal to find a position where there's a sudden change in the entropy (similar to Otsu Thresholding, i.e. maximising inter-class variance) then we come up with an O(n^2) algorithm as all the p_i must be individually divided.
TLDR; how to best extend the applicable domain of the algorithm -- or maybe there's another reasonable variant just like the traditional fast_exp algorithm as in https://stackoverflow.com/a/78509807/1716339
log2(2^e + (1+a)) ~= e + aa typo and you meanlog2(2^e * (1+a)) ~= e + a?Sigmayou mean\sum_i? (Unicode actually has distinct characters for sum ∑ and Sigma Σ)static_cast<float>(std::bitcast<int>(x))*magic1 + magic2, which on modern CPUs is just 2 uops:bitcastis a NOP,static_cast<float>iscvtdq2pson x86, and multiplication+addition can be joined to one fma instruction.magic1should be something like1.f/(1<<23)I think,magic2you can fiddle around to reduce the max or the average error.