1

I am using the Newton-Raphson Algorithm to divide IEEE-754 single-precision floating point values using single precision hardware.

I am using the method described at these two links:

  1. Wikipedia Newton-Raphson Division
  2. Newton-Raphson Method I'm Using

However, despite computing Xi to X_3 (i.e. using 3 iterations), my answers are still off a bit. I'm wondering why this is so? I'm comparing my results using MATLAB.

Here is the output showing an example of an incorrect result

 ===== DIVISION RESULT ===== Dividend: 94C5D0C9 Divisor: D3ACDF9A My answer: 009277C0 My answer: +0.00000 Correct answer: 009277C2 Correct answer: +0.00000 error = 2.384186e-007 Error: 34800000 k = 212823 

I have attached my MATLAB code here:

% Testing Newton-Raphson Division Algorithm % Clear window and variables clc; clear; % Open file for reading fid = fopen('generatedfloats_for_fpdiv.txt','r'); % Put all data into an array data = fscanf(fid,'%x'); % Loop through data array 2 values at a time for k=1:2:length(data) % Formatting floating point numbers into four parts % Split upper and lower 16 bits of each number into a separate variable dividend = typecast(uint32(data(k)),'single'); divisor = typecast(uint32(data(k+1)),'single'); % Get upper 16 bits of divided and divisor a_inword1 = (bitshift(data(k),-16)); b_inword1 = (bitshift(data(k+1),-16)); % Get exponents a_exponent = bitshift(a_inword1,-7); % Shift exponent right a_exponent = bitand(a_exponent,hex2dec('00ff')); % Remove sign bit from exponent b_exponent = bitshift(b_inword1,-7); % Shift exponent right b_exponent = bitand(b_exponent,hex2dec('00ff')); % Remove sign bit from exponent % Create Scaled Divisor and Dividend divisor_scaled = abs(divisor) * 2^((-1) - (b_exponent-127)); dividend_scaled = abs(dividend) * 2^((a_exponent-b_exponent -1) - (a_exponent-127)); % Solve for X0 = 48/17-32/17*D' % Perform multiplication 32/17*D' mult1 = single(single(32/17) * single(divisor_scaled)); % Perform subtraction 48/17 - (32/17*D') x0 = single(single(48/17) - single(mult1)); xi = x0; % Solve for Xi + Xi(1-D'Xi) for i=1:3 % Multiply D'Xi mult2 = single(single(divisor_scaled)*single(xi)); % Subtract 1 - (D'Xi) sub2 = single(single(1) - single(mult2)); % Multiply Xi * (1-D'Xi) mult3 = single(single(xi) * single(sub2)); % Add Xi + (Xi(1-D'Xi)) xi = single(single(xi) + single(mult3)); end % Perform final multiplication N'X3 myresult = single(single(dividend_scaled) * single(xi)); % Invert sign if result is supposed to be negative if (divisor * dividend < 0) myresult = -myresult; end fprintf('===== DIVISION RESULT =====\n'); fprintf('Dividend: %08X\n', data(k)); fprintf('Divisor: %08X\n', data(k+1)); fprintf('My answer: %08X\n',hex2dec(num2hex(single(myresult)))); fprintf('My answer: %+1.5f\n',single(myresult)); fprintf('Correct answer: %08X\n',hex2dec(num2hex(single(single(dividend)/single(divisor))))) fprintf('Correct answer: %+1.5f\n\n',single(single(dividend)/single(divisor))); % Calculate Error if(dividend == 0) error = abs(single(0-myresult)); else error = abs(single(single(single(dividend)/single(divisor))/single(myresult) - 1)); end % Pause if there is an error greater than 1 ulp if (error > 2^-23) error fprintf('Error: %08X\n', hex2dec(num2hex(single(error)))); k pause end % Close Input File fclose(fid); 

Here is the file containing the testing inputs (this should be stored in a file called generatedfloats_for_fpdiv.txt):

7A0C006C FAA1654F FF7FFFFF FF7FFFFF 00000000 FF7FFFFF 7F7FFFFF FF7FFFFF 00000000 D9E6C924 80800000 80800000 BF800000 80800000 00000000 80800000 3F800000 80800000 00800000 80800000 FF7FFFFF BF800000 80F329D0 BF800000 80800000 BF800000 BF800000 BF800000 00000000 BF800000 3F800000 BF800000 00800000 BF800000 4D2798AD BF800000 7F7FFFFF BF800000 FF7FFFFF 3F800000 C5F4B01A 3F800000 80800000 3F800000 BF800000 3F800000 00000000 3F800000 3F800000 3F800000 00800000 3F800000 02A8DEF8 3F800000 7F7FFFFF 3F800000 A2C4DB20 00800000 80800000 00800000 BF800000 00800000 00000000 00800000 3F800000 00800000 00800000 00800000 00000000 6F301E6E FF7FFFFF 7F7FFFFF 00000000 7F7FFFFF 7F7FFFFF 7F7FFFFF CFFFFFFF 10000000 E7DCD51B 3F800000 3DD8AD6C FCD8AD6C 0CB82AD0 8CB82AD0 8A856DBC 8A856DBC ADE5A620 ECE5A620 4F7FFFFF 0F800000 C1023D04 C1023CF4 8F6A27BC 8F6A2782 76E0BCED 76D9ED81 B59F5AE2 F49F5A43 12AC77FD 51AC6271 3520744A 740EA042 5D594D83 1D5B038B AC104C35 2C104C11 9CB2D620 5BB2D61D AB1AC0B0 6A1AC0AB 1F7C4A96 DE7C4A77 95AD8847 54AD5CF0 1C22AD4A DB225C1C 7BED0479 BBED047B C094DEEC 0094DEF6 4DC9E15F 8DC9E1C5 FBC3003F 3BC30103 FEC93614 3EC9686F 436A5FC8 836B4B14 D52DD147 153967C4 A6666AC4 29800000 F98E9ABA FB000000 36FBF9DF EF000000 402409D5 7A800000 C81946D2 7C915A00 EC1D18DD B2E597A5 709EA505 321F01DC FE51BE2A 4DB9A1BC 6F69A411 60F5814E 08CCC56E B0A84889 643A641A 6EAE2570 AC0C952C 57000000 D534A524 FE800000 B05CD51B 58000000 8F4D8AD7 95800000 FD2ECBC8 51B61A06 FCCB65E4 CFA4FECB 6EAE153B BFE119AC C1A97E44 913647B0 1F730C8E 8EAA321D C3BC9DEC A51D0BC8 30A4350D 9058EA7E 9CE2C0D1 9AF8176B CA3FFFFD 8A400000 F008F561 B008F563 F0754048 B075404B 7D7FFFFD 3D800000 52B209A5 12B209A7 7EFE1868 3EFE186A C5FFFFFE 86000000 E39A4BC8 A39A4BC9 455EDB0A 055EDB0B 677FFFFF 27800000 C44D976B 044D976C ED126E29 2D126E2A DFFFFFFE 20000000 4ED6F4EB 8ED6F4ED 58397C68 98397C6A C2FFFFFD 03000000 EDED365A 2DED365D 6D3FFFFD AD400000 E460628E 26000000 E821B21E 29978BE6 5749CDF0 97D2299C 55BF2E4A 95E1CA69 3F50B86B 00D65D71 61DC531D 23000000 FB52DDC0 BC4BE0E8 F6744295 B6B52B80 BDAD17D0 FCAD17D0 03C572DD 42C572DC 90800001 CF800000 8AE965BE C9E965BC 846E5BBB C36E5BB8 30800002 6F800000 BC512F61 FB512F5D BA15F623 F915F620 30800003 6F800000 91800003 50800000 27F87CF2 E6F87CED 1CDDA7B7 DBDDA7B3 88800002 47800000 1AA27658 D9A27656 A1C7E937 60C7E935 16800001 D5800000 35F35105 F4F35105 1F45A1EB 5DE7B505 0E20D62E CC4E27A5 ADAA8A46 6B800000 B9D2BA3D F7000000 B87ACA34 F5000000 087E87BE C4C7E789 FF0C0958 5F1010C0 CF808800 AA2060A1 AAAAAAAA BFDAFA00 F1199999 7DC06112 5DE66666 EB9295A3 56FFFDB9 AD400241 F0FEEFFF 421859B2 7D800289 F3A82C04 0A11D000 93041140 03800000 BF844014 88100400 AE100000 412AAAAA 7F000010 48199999 63820000 41CE400D 40002000 5AE13FFF 22040000 15000744 1B000002 2E7FFFFF 1FC00000 BA800000 8C080000 DDFD5AB9 AC200080 D8AAAAAA 5B820002 45555555 F9809000 ABE66666 00A00100 857FFF30 10040004 D2302FFF B5800003 AC0003FF 4A004001 12000000 88000280 702AAAAA 3997DBFD A9145FFF 1EF56DE9 038000BB 08EE3FFF B0687000 A25FD21D 43C20004 E5FFFFDF 11810040 447FDFFF 4DFB7FFF A1FFFFFB C72AAAAA 8AFFBFFF 54555555 77FFFDFF 274F8658 48FFBFFF B47FF8A6 6BFFEFFF 474C2000 117FFFFB 45800200 DCDF7FFF 5D808001 6CF5FFFF 40FD5AF7 A1FF3FFF FC7FEFFF C27DF7FF BEAAAAAA B7FF3FFF EB555555 59FBEFFF 58199999 D7EFFFFB 31E66666 CFEFFFFE BE9EAB13 017EFFEF 2B7FFD61 80BFFFFD 6B7FFFFF 5F7DFFFB D7800000 F7FFBFBF 71832006 72AAAAAA CA800001 A7AAAAAA 3F904000 AC2AAAAA 84D7DF6F C1AAAAAA 827FFFFB 36AAAAAA 15AAAAAA 9A2AAAAA 2F199999 63AAAAAA BBE66666 1E2AAAAA 15ED4FCE 8EAAAAAA 82FFFA1D B02AAAAA 6F004FFF 4AAAAAAA 487DB000 CD2AAAAA C2FFFFFF 1DAAAAAA 4E800000 D8AAAAAA F1900000 B7D55555 C9080200 9CD55555 A86FF7CF D1D55555 BBFFDFDF 8C555555 ACAAAAAA 3A555555 86199999 BCD55555 52E66666 77555555 19FFFED8 0FD55555 A6B99FFF 50D55555 5300038D 2BD55555 59FFFFFF BDD55555 FAE0C1AD E2199999 8F800400 87999999 AE024000 6C999999 A1DF7DFF A1999999 CFDFFFFF 36999999 736EFFFF 6C199999 442AAAAA 99999999 D0D55555 DA999999 1D999999 1C199999 E9E66666 D6999999 641FF15D 27199999 F0FFFBAC 68999999 7D9DDFFF 69999999 29800061 44199999 571B3000 EC199999 BD800000 D7999999 D3080080 7CE66666 5F7FFEFF C7666666 3ADFFF7F 3C666666 DB2AAAAA 39666666 68555555 7A666666 D4999999 9BE66666 41E66666 BCE66666 CE02CDE5 77666666 487FF867 C8666666 D4EC7FFF C9666666 8100051C A3E66666 BAFFFFFF 06666666 B3800024 0CBBF6B5 91E7F3FF 318D3102 416FFFFF 66B808BC 207EFF7F DBE3E077 52AAAAAA B89D573A FF555555 D9DE7340 6C199999 3B1F0F46 D8E66666 1C60AB4C 655172A1 D6F63010 3F7FFD22 47A7E55D 6C3A9FFF 68E90163 388001D7 237E0628 A54DD000 44BFA22E 127FFFFF A600BE34 AC000000 D6B2F380 441C889C 627FFA61 65800040 B77FFA1C 35840100 1CFFF82A 15F779FF 01FFFA77 6FFEFFFF 56FFFA32 0377FEFF ABFFF9EC 03199999 13FFFAC9 50666666 9BFFFE10 FCA0175C 767FFAD5 9000067A 09FFF82D 3C9C8000 E47FFCF2 A97FFFFF 05FFF8F8 23800000 567FFE45 12464A93 32494FFF 71880000 47746FFF 2CFFBFEB D1AB8FFF 7DFFFEFF 66D69FFF 08FFF6FF BC023FFF 53AAAAAA A83B5FFF 9A199999 B3824FFF 93845430 8F596FFF 27800336 A938FFFF B3EB6000 63CE7FFF 00FFFFFF A50F5FFF 8D000000 A650AFFF 303024C9 B18007DF BB810000 57800507 EFEFF7FF 6C0004D8 EB2AAAAA C7800641 B7D55555 C2000305 31999999 13000052 BE666666 54000458 B7FFFB39 30000522 11BC6FFF A080026F BE000009 82000675 4ACF2000 BC800339 97FFFFFF 04800681 24800000 05800287 DF872000 F267D000 C0300000 2C643000 FEEFFFF7 7C20C000 42AAAAAA 6749E000 CF555555 A88B4000 A9199999 9290A000 0EFFFFF5 8FA8D000 BB9F4FFF F0EA7000 558004C5 219BB000 E21DC000 DC313000 4EFFFFFF 3D725000 FB800000 5EB3E000 EFD80480 C27FFFFF 82080008 3C7FFFFF AD3FFDDB 51FFFFFF 497FFFFE A6FFFFFF A16DFFFF DBFFFFFF 392AAAAA A07FFFFF AD666666 537FFFFF 39DAEA49 54FFFFFF E67FFCB0 2F7FFFFF 12EDEFFF 507FFFFF CD000180 A17FFFFF 79D75000 C27FFFFF 52800000 FE7FFFFF EE800080 B7000000 90100200 CC800000 BE7DFBFD 61800000 DD7FDFFF C7000000 90AAAAAA 3F800000 3DD55555 60800000 C9999999 1B800000 64666666 4C000000 1DFFF953 35000000 EA3C0FFF EF800000 D1267000 61800000 CA000000 7D800000 FF0C0958 5F1010C0 CF808800 AA2060A1 AAAAAAAA BFDAFA00 F1199999 7DC06112 5DE66666 EB9295A3 56FFFDB9 AD400241 F0FEEFFF 421859B2 7D800289 F3A82C04 0A11D000 93041140 03800000 BF844014 88100400 AE100000 412AAAAA 7F000010 48199999 63820000 41CE400D 40002000 5AE13FFF 22040000 15000744 1B000002 2E7FFFFF 1FC00000 BA800000 8C080000 DDFD5AB9 AC200080 D8AAAAAA 5B820002 45555555 F9809000 ABE66666 00A00100 857FFF30 10040004 D2302FFF B5800003 AC0003FF 4A004001 12000000 88000280 702AAAAA 3997DBFD A9145FFF 1EF56DE9 038000BB 08EE3FFF B0687000 A25FD21D 43C20004 E5FFFFDF 11810040 447FDFFF 4DFB7FFF A1FFFFFB C72AAAAA 8AFFBFFF 54555555 77FFFDFF 274F8658 48FFBFFF B47FF8A6 6BFFEFFF 474C2000 117FFFFB 45800200 DCDF7FFF 5D808001 6CF5FFFF 40FD5AF7 A1FF3FFF FC7FEFFF C27DF7FF BEAAAAAA B7FF3FFF EB555555 59FBEFFF 58199999 D7EFFFFB 31E66666 CFEFFFFE BE9EAB13 017EFFEF 2B7FFD61 80BFFFFD 6B7FFFFF 5F7DFFFB D7800000 F7FFBFBF 71832006 72AAAAAA CA800001 A7AAAAAA 3F904000 AC2AAAAA 84D7DF6F C1AAAAAA 827FFFFB 36AAAAAA 15AAAAAA 9A2AAAAA 2F199999 63AAAAAA BBE66666 1E2AAAAA 15ED4FCE 8EAAAAAA 82FFFA1D B02AAAAA 6F004FFF 4AAAAAAA 487DB000 CD2AAAAA C2FFFFFF 1DAAAAAA 4E800000 D8AAAAAA F1900000 B7D55555 C9080200 9CD55555 A86FF7CF D1D55555 BBFFDFDF 8C555555 ACAAAAAA 3A555555 86199999 BCD55555 52E66666 77555555 19FFFED8 0FD55555 A6B99FFF 50D55555 5300038D 2BD55555 59FFFFFF BDD55555 FAE0C1AD E2199999 8F800400 87999999 AE024000 6C999999 A1DF7DFF A1999999 CFDFFFFF 36999999 736EFFFF 6C199999 442AAAAA 99999999 D0D55555 DA999999 1D999999 1C199999 E9E66666 D6999999 641FF15D 27199999 F0FFFBAC 68999999 7D9DDFFF 69999999 29800061 44199999 571B3000 EC199999 BD800000 D7999999 D3080080 7CE66666 5F7FFEFF C7666666 3ADFFF7F 3C666666 DB2AAAAA 39666666 68555555 7A666666 D4999999 9BE66666 41E66666 BCE66666 CE02CDE5 77666666 487FF867 C8666666 D4EC7FFF C9666666 8100051C A3E66666 BAFFFFFF 06666666 B3800024 0CBBF6B5 91E7F3FF 318D3102 416FFFFF 66B808BC 207EFF7F DBE3E077 52AAAAAA B89D573A FF555555 D9DE7340 6C199999 3B1F0F46 D8E66666 1C60AB4C 655172A1 D6F63010 3F7FFD22 47A7E55D 6C3A9FFF 68E90163 388001D7 237E0628 A54DD000 44BFA22E 127FFFFF A600BE34 AC000000 D6B2F380 441C889C 627FFA61 65800040 B77FFA1C 35840100 1CFFF82A 15F779FF 01FFFA77 6FFEFFFF 56FFFA32 0377FEFF ABFFF9EC 03199999 13FFFAC9 50666666 9BFFFE10 FCA0175C 767FFAD5 9000067A 09FFF82D 3C9C8000 E47FFCF2 A97FFFFF 05FFF8F8 23800000 567FFE45 12464A93 32494FFF 71880000 47746FFF 2CFFBFEB D1AB8FFF 7DFFFEFF 66D69FFF 08FFF6FF BC023FFF 53AAAAAA A83B5FFF 9A199999 B3824FFF 93845430 8F596FFF 27800336 A938FFFF B3EB6000 63CE7FFF 00FFFFFF A50F5FFF 8D000000 A650AFFF 303024C9 B18007DF BB810000 57800507 EFEFF7FF 6C0004D8 EB2AAAAA C7800641 B7D55555 C2000305 31999999 13000052 BE666666 54000458 B7FFFB39 30000522 11BC6FFF A080026F BE000009 82000675 4ACF2000 BC800339 97FFFFFF 04800681 24800000 05800287 DF872000 F267D000 C0300000 2C643000 FEEFFFF7 7C20C000 42AAAAAA 6749E000 CF555555 A88B4000 A9199999 9290A000 0EFFFFF5 8FA8D000 BB9F4FFF F0EA7000 558004C5 219BB000 E21DC000 DC313000 4EFFFFFF 3D725000 FB800000 5EB3E000 EFD80480 C27FFFFF 82080008 3C7FFFFF AD3FFDDB 51FFFFFF 497FFFFE A6FFFFFF A16DFFFF DBFFFFFF 392AAAAA A07FFFFF AD666666 537FFFFF 39DAEA49 54FFFFFF E67FFCB0 2F7FFFFF 12EDEFFF 507FFFFF CD000180 A17FFFFF 79D75000 C27FFFFF 52800000 FE7FFFFF EE800080 B7000000 90100200 CC800000 BE7DFBFD 61800000 DD7FDFFF C7000000 90AAAAAA 3F800000 3DD55555 60800000 C9999999 1B800000 64666666 4C000000 1DFFF953 35000000 EA3C0FFF EF800000 D1267000 61800000 CA000000 7D800000 1A6AA41C B42C29BF 6E11A944 D480452D 105640A2 87959BD1 7422DC0D 6D0E43AA 55ACBB94 1D06D383 C80AA52A 5CE3851F 099B7C06 0397C72F 89D5073F A25265A2 4476B3FE 1704BD92 4581E9FD 3461D127 3B815934 1E1EA9BE D74C0685 CCE34EAC 3E52F9BF 30420D4A B9EA85EB 8C3D45DE A162F6DA 9459E167 9C29EA20 C8F796B1 C5BD9F02 B7168D05 F736D700 3946C8CE E20B8F25 BC967BD2 61BAC3AA 5205DABA 257338C7 AC30E694 13C5DD3F 2307A194 0B9317AE AC534B2B 99271710 47103D91 E5F8F6BF C7800000 F145F346 63400000 C91F2994 97C00000 C1B8E7D5 18D40000 F5844BAD 6AFA0000 ACF90D5F 5D794000 DE3632EC 49804000 DDBE13CC 6F5B4000 204C6402 4E962400 EC2773F0 34935000 75FBB2EC DCAE2600 76F79C0E BADC18F0 00000000 A01F7AF0 1FACEE1B 30D0EFBD 
2
  • You have single-precision hardware that doesn't support single-precision division? Commented Jul 18, 2014 at 22:14
  • @TonyK, That is correct. Commented Jul 18, 2014 at 23:12

3 Answers 3

2

The algorithm you are showing is trying to evaluate the reciprocal of the divisor D then multiply by dividend N, or in other words single(N * single(1/D)).

This will invariably fail for a set of bad operands. For example, take N=3 and D=7 and look at these results:

0.14285714924335479736328125 = float(1/7) = closest 32bits float to (1/7) 0.42857144773006439208984375 = 3*float(1/7) = 3 times closest float to (1/7) 0.4285714626312255859375 = float(3*float(1/7) = closest float to above number 0.4285714328289031982421875 = float(3/7) = closest float to (3/7) 

By virtue of IEEE 754 properties, float(float(3)/float(7)) is closest float to 3/7.
So if your inner loop perfectly finds the closest float to the inverse of 7, you can't simply reconstruct the closest float to 3/7 by multiplying it by 3 in single precision...

We have an overestimate here:

float(float(3)*float(1/7)) > float(3/7) > (3/7) 

But if we take the next representable float before float(1/7), we have an underestimate:

float(3/7) > (3/7) > float(float(3)*predecessor(float(1/7))) 

So, there is more: we just shown that there is not any float Q such that float(float(3)*Q) = float(float(3)/float(7))...

Even, if your inner reciprocal loop were imperfect, the outer single(N*reciprocal(D)) will still fail...

EDIT: How to proceed then?

Once you get a reasonnable approximate of the reciprocal, one possible way to correctly compute the division is to compute the reconstruction error and compensate it like demonstrated with this pseudo-code, but you'll need to emulate the fused-multiply-add:

z := float(1/y) ; // Estimate of the reciprocal - i.e. with your inner loop q := float(x * z); // Estimate of the quotient r := float(x - y*q);// Estimate of the error using a FMA return float(q + r*z); // correct the error using another FMA 

That's how I did it in my Smalltalk implementation of arbitrary precision floating point (https://code.google.com/p/arbitrary-precision-float/).
You'll find other algorithms on the Internet. One reference I like about division is:

Accelerating Correctly Rounded Floating-Point Division when the Divisor Is Known in Advance - Nicolas Brisebarre, Jean-Michel Muller, Member, IEEE, and Saurabh Kumar Raina - http://perso.ens-lyon.fr/jean-michel.muller/DivIEEETC-aug04.pdf

Sign up to request clarification or add additional context in comments.

1 Comment

I tried running your algorithm using matlab by performing the calculations for r and q+r*z as "double" data types and so I wouldn't lose precision. However, I ended up with wrong answers. To get within 1 ulp, I calculate the result, the result + 1 ulp, and the result - 1 ulp. Then I choose the one that has the smallest error. I calculate error as divisor * myresult - dividend.
1

I think your answer is within computational accuracy of single precision floating point numbers. The maximum precision for floating point numbers is only 7 digits (for info check the wikipedia article or the IEEE-754 standard)

Single precision

http://en.wikipedia.org/wiki/Single_precision

You can check the accuracy by dividing the answers (not subtracting them), accuracy is not in terms of magnitude but in digits in exponential notation.

-340282346638528860000000000000000000000.00000 / -340282306073709650000000000000000000000.00000 = 1.000000119209311

An error of about ~1e-7 thus in the range of machine accuracy for single floating point numbers.

Or look at the number of matching decimal digits which is 7 which is the limit for single precision floating point:

3402823 46638528860000000000000000000000.00000

3402823 06073709650000000000000000000000.00000

Double precision

If you go to double precision (64-bit doubles) then the accuracy increases to approximately 16 significant digits.

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

Accumulated machine accuracy (AMA)

Accumulated machine accuracy is something we used to describe the effect of round-off errors stacking. Assume that every single precision number has the exact answer plus some small error eps (epsilon, the machine accuracy for a single operation). Now sum A/B + C/D. First of all the numbers A, B, C, and D are often not represented exactly by floating point operations and can contain an error as large as eps. Secondly the operation / could introduce and error of eps. Now let the answer to A/B be E plus some again a small error eps, and F the answer to C/D plus some small error eps. This means that A/B + C/D = E+eps + F+eps = E+F+2eps.

Of course sometimes the 'in between' answer (such as E and F) are an overestimate and underestimate thus cancelling the errors a bit. I always use the rule of thumb that during iterative schemes (especially in FEM, FVM, or VOF) the maximum achievable accuracy (pretty certain, no guarantee), is the number elements in your sum times eps.

In other words: your code looks fine ;)

9 Comments

You're getting a result that's only 1 ULP (unit in the last place) away from the best possible result. That's pretty darn good for a multi-stage floating-point calculation! Producing precisely correctly rounded results is hard, and will typically involve using extended precision for intermediate calculations - in some cases (e.g. computing the cosine of a very large number) a lot of extended precision is necessary. As @EJG89 says, your code is working just fine; you need to make your expectations more reasonable.
Sorry; above comment was aimed at @starbox, just in case that wasn't clear. :-)
@starbox: Probably not with plain Newton-Raphson. It wouldn't surprise me if there are simple tricks to get the last bits correct in most or all cases. But the most obvious way would be to use double precision for the intermediate calculation, and only round back to single precision at the end. That would get you correct rounding almost all of the time, but you'd likely still be able to find corner cases where you end up a little over 0.5 ulps out.
Well okay. If you happen to come across something, please let me know. I know @StephenCanon would be a great person to answer this question.
One last comment: note that the method you're using computes an approximation to the reciprocal of the divisor, and then as the last step multiplies that by the dividend. Even if your reciprocal is correctly rounded, that last multiplication step can introduce an error. There's really no way around that without using extra intermediate precision to represent the dividend in the first place.
|
-1

Doen't Wikipedia suggest using a pair of fused-multiply-add?

You might try and emulate fused-multiply-add with double precision

function y=fmadd(a,x,b) y = single(a*x+b) 

And use it with:

divisor_scaled = single(divisor_scaled); % put this out of the loop % ...snip... xi = fmadd( fmadd(-divisor_scaled,xi,1) , xi , xi); 

EDIT

As explained in my other answer, the Newton Raphson iteration is there to get a close approximation of the reciprocal 1/D, but this is not sufficient for producing an exactly rounded division N/D by simply multiplying this reciprocal approximation by N, further steps are required.

So the FMA I proposed certainly won't make the whole division work as it is programmed now.

But it would be worth to test that it indeed helps getting an exactly rounded approximation of the reciprocal, because this is required by some division algorithms like the one I proposed in my other answer.

If it does, then there are ways to emulate the FMA with only single precision + - and *

3 Comments

My hardware doesn't have a fused multiply-add (FMA), it is only single-precision add/sub and single-precision multiply. This isn't an off the shelf processor.
@starbox Oh, but knowing that fmadd is mentionned in the wikipedia article and knowing that you can test it in matlab, aren't you still curious enough to know if it solves your problem???
@starbox before you try it, my other answer is showing that such FMA would not necessarily improve things. The problem lies outer single(N*single(reciprocal(D))) will fail for some (N,D) whatever the quality of reciprocal(D) approximation. So trust wikipedia, but not too much ;)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.