Tweeted twitter.com/StackSignals/status/1604219899605286916

occurred Dec 17, 2022 at 21:00

Became Hot Network Question

occurred Dec 17, 2022 at 20:33

correct mistake

Source Link

edited Dec 17, 2022 at 18:57

Zitrax

158
1
6

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

An example:

Downscale: Input floats in an original range [0-5000] are scaled down to [-1,1). I tried with scaling function sign(val) * (abs(val) - min)/(max-min) where min is 0 and max 5000. This takes the values 6 and 10 to 0.0012 and 0.002.
FloatToQ31: 0.0012 -> 2576980, 0.002 -> 4294967
Multiply in Q31: 2576980*4294967 >> 31 = 5152
Q31ToFloat: 5152 -> 2.39909e-06
Upscale: Reversed downscale function val * (max - min) + min (again min=0, max=10000max=5000). This takes the value back to 0.0119954 which is not close to 60.

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

An example:

Downscale: Input floats in an original range [0-5000] are scaled down to [-1,1). I tried with scaling function sign(val) * (abs(val) - min)/(max-min) where min is 0 and max 5000. This takes the values 6 and 10 to 0.0012 and 0.002.
FloatToQ31: 0.0012 -> 2576980, 0.002 -> 4294967
Multiply in Q31: 2576980*4294967 >> 31 = 5152
Q31ToFloat: 5152 -> 2.39909e-06
Upscale: Reversed downscale function val * (max - min) + min (again min=0, max=10000). This takes the value back to 0.0119954 which is not close to 60.

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

An example:

Downscale: Input floats in an original range [0-5000] are scaled down to [-1,1). I tried with scaling function sign(val) * (abs(val) - min)/(max-min) where min is 0 and max 5000. This takes the values 6 and 10 to 0.0012 and 0.002.
FloatToQ31: 0.0012 -> 2576980, 0.002 -> 4294967
Multiply in Q31: 2576980*4294967 >> 31 = 5152
Q31ToFloat: 5152 -> 2.39909e-06
Upscale: Reversed downscale function val * (max - min) + min (again min=0, max=5000). This takes the value back to 0.0119954 which is not close to 60.

added 585 characters in body

Source Link

edited Dec 17, 2022 at 18:49

Zitrax

158
1
6

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

An example:

Downscale: Input floats in an original range [0-5000] are scaled down to [-1,1). I tried with scaling function sign(val) * (abs(val) - min)/(max-min) where min is 0 and max 5000. This takes the values 6 and 10 to 0.0012 and 0.002.

FloatToQ31: 0.0012 -> 2576980, 0.002 -> 4294967

Multiply in Q31: 2576980*4294967 >> 31 = 5152

Q31ToFloat: 5152 -> 2.39909e-06

Upscale: Reversed downscale function val * (max - min) + min (again min=0, max=10000). This takes the value back to 0.0119954 which is not close to 60.

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

An example:

Downscale: Input floats in an original range [0-5000] are scaled down to [-1,1). I tried with scaling function sign(val) * (abs(val) - min)/(max-min) where min is 0 and max 5000. This takes the values 6 and 10 to 0.0012 and 0.002.

FloatToQ31: 0.0012 -> 2576980, 0.002 -> 4294967

Multiply in Q31: 2576980*4294967 >> 31 = 5152

Q31ToFloat: 5152 -> 2.39909e-06

Upscale: Reversed downscale function val * (max - min) + min (again min=0, max=10000). This takes the value back to 0.0119954 which is not close to 60.

Source Link

asked Dec 17, 2022 at 12:33

Zitrax

158
1
6

Fixed point scaling; float -> Q31 -> float

I am new to DSP and I am trying to understand how to work with fixed point operations and in particular the Q31 format.

In floating point (full range) I am doing some multiplications, it could be for example 6.0 * 10.0 = 60.0. Converting this to Q31 would require first scaling it down to [-1,1) and then converting that to an int32_t (q31).

However when doing the same multiplication in Q31 we get a new Q31 result (after bit shifting properly), that can be taken back to float in the range [-1,1). But if I then scale that back up by reversing the original downscaling I do not end up with 60.

In summary:

Downscale full range float to float in the range [-1,1) to be able to be represented in Q31
Convert float in the proper range to Q31
Perform fixed point multiplication (and possibly other unknown operations)
Convert Q31 back to float
Scale this float back up to the original range?

Is it possible to scale back to the original range somehow as in my last point - or is it a one-way conversion?

Stack Exchange Network

Return to Question

Fixed point scaling; float -> Q31 -> float