I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.
In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).
However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.
In summary:
- Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
- Convert float in the proper range to Q31
- Perform fixed point multiplication (and possibly other unknown operations)
- Convert Q31 back to float
- Scale this float back up to the original range?
Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?
An example:
- Downscale: Input floats in an original range [0-5000] are scaled down to [-1,1). I tried with scaling function
sign(val) * (abs(val) - min)/(max-min)where min is 0 and max 5000. This takes the values 6 and 10 to0.0012and0.002. - FloatToQ31: 0.0012 ->
2576980, 0.002 ->4294967 - Multiply in Q31: 2576980*4294967 >> 31 =
5152 - Q31ToFloat: 5152 ->
2.39909e-06 - Upscale: Reversed downscale function
val * (max - min) + min(again min=0, max=10000max=5000). This takes the value back to0.0119954which is not close to 60.