I'm implementing 32-bit signed integer fixed-point arithmetic. The scale is from 1 to -1, with INT32_MAX corresponding to 1. I'm not sure whether to make INT32_MIN or -INT32_MAX correspond to -1, but that's an aside for now.
I've made some operations to multiply and round, as follows:
#define mul(a, b) ((int64_t)(a) * (b)) #define round(x) (int32_t)((x + (1 << 30)) >> 31) The product of two numbers can then be found using round(mul(a, b)).
The issue comes up when I check the identity. The main problem is that 1x1 is not 1. It's INT32_MAX-1. That's obviously not desired as I would like bit-accuracy. I suppose this would affect other nearby numbers so the fix isn't a case of just adding 1 if the operands are both INT32_MAX. Additionally, -1x-1 is not -1, 1x-1 is not -1, and -1x-1=-1. So none of the identities hold up.
Is there a simple fix to this, or is this just a symptom of using fixed point arithmetic?
x + (1 << 30)is very likely to cause an integer overflow bug. How do you protect against that?INT32_MINor-INT32_MAXcorrespond to-1". It must be-INT32_MAXotherwise you will have a different 'scale' for positive and negative numbers and won't be able to do any operations that involve a positive and a negative number.2147483647 * 2147483647you need to divide by2147483647to realign. You can't do that by shifting. You can only to that if the 'scale' is a power of 2, so that1is represented by, for example, 2**30. Then the product of two positive vlaues could be((int64_t)a * b) >> 30with some rounding before shifting if you like.[-1, +1)using2**31as the scale factor, or 2) use a range of[-2, +2)with a scale factor of2**30. Those are half-open intervals as defined here.