Fixed point multiplication in assembly (x86)

Question

I want to multiply and divide an unsigned 8.8 fixed-point number in the ax register with 1.00125 and store the result in ax as well.

I know that fixed point multiplication/division requires some extra steps but I have no idea how to implement those in assembly.

Help is greatly appreciated.

Start by writing pseudocode, or higher-level language. Figure out the sequence of mathematical operations that will be needed (multiply, divide, add, subtract, shift, mask). Once you have the algorithm clearly defined, then think about how to implement it in assembly. Look up what instruction or sequence can be used to accomplish each step of the algorithm. — Nate Eldredge
– Nate Eldredge, Commented Oct 31, 2021 at 17:30
Note that x86 has widening multiply, and similarly division with a dividend twice as wide as the quotient and divisor, so if you use those instead of shift-and-add you can take advantage of that to get the high-half product easily. 0.00125 is less than 1/256, and if you tried to represent it as an 8.8 fixed-point number it would round to 0. And it's not a 2^-n binary fraction so it's not just a simple right-shift and add. — Peter Cordes
– Peter Cordes, Commented Oct 31, 2021 at 19:26
The easiest way is probably to use floating point numbers and then to just convert the result back to 8.8 fixed point. — fuz
– fuz, Commented Oct 31, 2021 at 23:26

Peter Cordes · Accepted Answer · 2021-11-01 04:42:18Z

If you care about accuracy, 1.00125 can't be stored exactly in any integer format or in any floating point format because it's a recursive fraction in binary (in binary it's 1.000000000101000111101011100001010001111010111...b where that 00001010001111010111 sequence repeats forever). For this reason I'd convert it into the rational number 801/800; and then do x * 1.00125 = (x * 801) / 800 (possibly with "round to nearest" on the division).

If you don't care about accuracy, then the more bits you can use for "1.00125" the closer the result will be to the correct answer. With 8 bits ("1.7 fixed point") the closest you can get is 1.0000000b, which means you can just skip the multiplication (x * 1.00125 = x). With 16 bits ("1.15 fixed point") the closest you can get is 1.000000000101001b (or 1.001220703125 in decimal).

However, you can cheat more. Specifically, you can significantly increase accuracy with the same number of bits by doing (x * 1) + (x * 0.00125). E.g. instead of having a 16 bit constant like 1.000000000101001b (where 9 bits are zeros), you can have a 16-bit constant like 0.0000000001010001111010111b (where the 16 bits are the last 16 bits without any of the leading zeros). In this case the constant is very close (like 0.00124999880) rather than "less close" (like 1.001220703125 was).

Ironically; with only 16 bits, this "0.00125" is more accurate than a 32-bit floating point representation of 1.00125 can be.

So.. in assembly (assuming everything is unsigned) it might look like:

 ;ax = x << 8 (or x as an 8.8 fixed point number) mov cx,ax ;cx = x << 8 mov bx,41943 ;bx = 41943 = 0.00124999880 << 25 mul bx ;dx:ax = (x << 8) * (0.00124999880 << 25) = x * 0.00124999880 << 33 ;dx = x * 0.00124999880 << 17 shr dx,9 ;dx = x * 0.00124999880 << 17 >> 9 = x * 0.00124999880 << 8, carry flag = last bit shifted out adc dx,0 ;Round up to nearest (add 1 if last bit shifted out was set) lea ax,[dx+cx] ;ax = x << 8 + x * 0.00124999880 << 8 = x * 1.00124999880 << 8

Of course the problem here is that converting it back to "8.8 fixed point" ruins most of the accuracy anyway. To keep most of the accuracy, you could use a 32-bit result ("8.24 fixed point") instead. This might look like:

 ;ax = x << 8 (or x as an 8.8 fixed point number) mov cx,ax ;cx = x << 8 mov bx,41943 ;bx = 41943 = 0.00124999880 << 25 mul bx ;dx:ax = (x << 8) * (0.00124999880 << 25) = x * 0.00124999880 << 33 add ax,1 << 8 ;To cause the following shift to round to nearest adc dx,0 shrd ax,dx,9 shr dx,9 ;dx:ax = x * 0.00124999880 << 33 >> 0 = x * 0.00124999880 << 24 ;cx:0 = x << 24 add dx,cx ;dx:ax = x << 24 + x * 0.00124999880 << 24 = x * 1.00124999880 << 24

The other problem is that there's potential overflow. E.g. if x was 0xFF.FF (or about 255.996) the result would be something like 256.32 which is too big to fit in an "8.8" or "8.24" or "8.anything" fixed point format. To avoid that problem you can just increase the number of integer bits (and reduce the accuracy by 1 bit) - e.g. make the result "9.7 fixed point", or "9.23 fixed point".

The important points here are:

a) For "fixed point" calculations, every operation (multiplication, division, addition, ...) causes the decimal point to move.

b) Because the decimal point keeps moving, it's best to adopt a standard notation for where the decimal point is at each step. My way is to include an "explicit shift" in the comments (e.g. "x << 8" rather than just "x"). This "explicit shift documented in the comments" makes it easy to determine where the decimal point moves, and if/how much you need to shift left/right to convert to a different fixed point format.

c) For good code, you need to pay attention to accuracy and overflow, and this causes the decimal point to move around even more (and makes the use of a "standard notation for where the decimal point is" more important).

Addition / subtraction doesn't cause the decimal point (actually binary point or "radix point") to move, unless you're changing formats to make overflow impossible. add ax, cx is all you need to add two 8.8 numbers into an 8.8 result. But yes, to get 9.7 fixed point, you'd rcr ax, 1 after to shift the carry-out back into the top, and discard the low bit instead. (Then optionally adc ax, 0 to round up if that discarded bit was a 1, or maybe inc ax before rcr; inc preserves CF. The sum of two numbers can't be 0xFFFF and have a carry-out so that's fine)
@PeterCordes: Yes, assuming you're adding numbers in the same format. Otherwise (if number of fractional bits differs) shifting of at least one operand is necessary before addition or subtraction.
Right, therefore I think it's possibly misleading to say "every operation (..., addition, ...) causes the decimal point to move" in point (a). It's worth at least noting that it's only due to format-change there, to avoid confusing fixed-point beginners about something they thought they understood which it sounds like you're contradicting.

fuz · Accepted Answer · 2021-10-31 23:40:41Z

An easy solution is to just use the x87 floating point unit to do the multiplication. Assuming real mode with nasm (untested):

example: push bp mov sp, bp ; establish stack frame push ax push ax ; make space for quotient fild word [bp-2] ; load number fld st0 ; duplicate top of stack fmul dword [factor] ; compute product fistp word [bp-2] fmul dword [invfac] ; compute quotient fistp word [bp-4] pop dx ; quotient pop ax ; product pop bp ; tear down stack framt ret factor dd 1.00125 invfac dd 0.999875 ; 1/1.00125

This leaves the quotient in dx and the product in ax. Rounding is done according to the rounding mode configured in the x87 FPU (which should be rounding to nearest by default).

Erik Eidt · Accepted Answer · 2021-11-01 02:13:53Z

One thing to understand about fixed point multiplication that the point of rhe result is the point of operand 1 plus the point of operand 2.

Thus, when multiplying two numbers with fixed point of zero, we get a result with fixed point zero.

And when multiplying two numbers with fixed point at 8 places (binary) we get a number with fixed point at 16 places.

So, need to scale down such result as needed.

Collectives™ on Stack Overflow

Fixed point multiplication in assembly (x86)

3 Answers 3

3 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Related