Fast bithacked log2 approximation

Question

In order to estimate entropy of a sequence, I came up with something like this:

float log2_dirty(float x) { return std::bitcast<float>((std::bitcast<int>(x) >> 4) + 0x3D800000)) - 15.0f; }

This is loosely based on the identity of log2(1+x) ~= x which over the interval of x=[0..1] has the maximum error of about 8%.

Also for log2(2^e * (1+a)) ~= e + a the idea now is to move parts of the exponent to a linear part, then adding a bias some range of the original value x reasonably closely matches the true value as in

 appr std::log2 x = 2.0 1 1 x = 3.0 1.5 1.58496 x = 4.0 2 2 x = 5.0 2.25 2.32193 x = 6.0 2.5 2.58496 ... x = 29.0 4.8125 4.85798 x = 30.0 4.875 4.90689 x = 31.0 4.9375 4.9542

Other parametrisation would be x >> 5 + 0x40000000 in integer domain, then -31.0f in fp domain.

The ranges where this works is between 2 <= x < 32.f or 2.0 <= x < 64.f, but I'm wondering if and how I could include also the range 1..2 or even 1/16 <= x < 32 with as little tinkering as possible.

The best I was thinking was return log2_dirty(x * 16) - 4.0f for 1/16 <= x < 2 range, but maybe something better is available (also targeting SIMD and branchless programming).

One additional target is to compute n log2 n, as that is used for entropy computation (And here I'm actually not using p log2 p, p<=1.0 but positive values n which if necessary I can clamp to 1.0 and which in my case is also limited to x < 31).

Entropy is traditionally computed as - Sigma log2(p_i)*p_i, where p_i = n_i / Sigma(n_i), but that can be also expressed as Sigma log2(n_i)*n_i - Sigma(n_i) * log2(Sigma(n_i)), allowing to reuse the logarithms of overlapping sequences. The traditional method has the nice property of 0<=p_i<=1 -- that is the input is automatically limited, but it requires not only division, but also if we want to search over a linear signal to find a position where there's a sudden change in the entropy (similar to Otsu Thresholding, i.e. maximising inter-class variance) then we come up with an O(n^2) algorithm as all the p_i must be individually divided.

TLDR; how to best extend the applicable domain of the algorithm -- or maybe there's another reasonable variant just like the traditional fast_exp algorithm as in https://stackoverflow.com/a/78509807/1716339

Is log2(2^e + (1+a)) ~= e + a a typo and you mean log2(2^e * (1+a)) ~= e + a? — chtz
– chtz, Commented Jun 26 at 8:25
In your entropy formula, I guess by Sigma you mean \sum_i? (Unicode actually has distinct characters for sum ∑ and Sigma Σ) — chtz
– chtz, Commented Jun 26 at 8:33
I would usually do static_cast<float>(std::bitcast<int>(x))*magic1 + magic2, which on modern CPUs is just 2 uops: bitcast is a NOP, static_cast<float> is cvtdq2ps on x86, and multiplication+addition can be joined to one fma instruction. magic1 should be something like 1.f/(1<<23) I think, magic2 you can fiddle around to reduce the max or the average error. — chtz
– chtz, Commented Jun 26 at 8:36
"the identity of log2(1+x) ~= x which over the interval of x=[1..2] has the maximum error of about 8%" Errr. The "identity" is only true at 0 and 1. If you are thinking Taylor, it's log(1+x)~x near 0, with the base-e log. As to the approximation log2(1+x)~x on [1..2], note that at 2, the relative error is 26%. — user30666118
– user30666118, Commented Jun 26 at 8:56
You might want to read the extensive discussion of this idea in stackoverflow.com/questions/75772363/… — Lutz Lehmann
– Lutz Lehmann, Commented Jun 26 at 9:57

chtz · Accepted Answer · 2025-06-27 21:14:54Z

On x86-architectures with native FMA support, I guess the fastest way is

return static_cast<float>(std::bit_cast<int>(x))*(1.f/(1<<23))- 127.f;

You get slightly less bit-loss (i.e., a more monotonic behavior), if you first subtract the offset in the integer domain and scale afterwards:

return static_cast<float>(std::bit_cast<int>(x)-0x3f800000)*(1.f/(1<<23));

If you want to avoid the static_cast, your idea is generally sound. And by scaling the input value you can move the window of valid values. Notice though that scaling by a power of two is just adding/subtracting from the exponent (as long as you don't leave the range of normalized numbers). Thus by subtracting your lowest possible input value, you get an exponent in the range [0, Emax-Emin], and if Emax-Emin is 2^N, you can shift by N to get the reduced exponent into the significant of the floating point representation, after which you add (still in the integer domain) the float representation of the corresponding power of two, bitcast back to float and subtract (as a float) (2^N)-Emin.

Example, N=3, Emin=-3, Emax=5 (i.e., valid between 1.f/8 and 32.f):

return std::bit_cast<float>( ((std::bit_cast<int32_t>(x)-std::bit_cast<int32_t>(1.f/8)) >>3) +std::bit_cast<int32_t>(8.f)) - (8.f - -3.f);

Now by moving the subtraction after the shift, you get an expression exactly in your desired form:

return std::bit_cast<float>( (std::bit_cast<int32_t>(x) >>3) +(std::bit_cast<int32_t>(8.f) - (std::bit_cast<int32_t>(1.f/8) >> 3))) - (8.f - -3.f);

If you shift by 4 instead of 3, you can change the constants to have range of 16 in the exponent, e.g., -8 to 8, i.e. 1/256 to 256.

N.B.: Since the approximation is always too small, except for powers of two, you could add something around 0.043 to reduce the maximal absolute error.

Interesting idea. I first discovered the inverse about 2 year ago as (639.f - q) << 9 and later with some smaller constant which contained partially some proper bit pattern (sign and 3-4 bits of the exponent) mitigating the amount of truncation.

Collectives™ on Stack Overflow

Fast bithacked log2 approximation

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related