3

I'm creating a program in Fortran that takes in an x for sin(x) in radians, then the number of terms to compute.

This is my program:

! Sine value using MacLaurin series program SineApprox implicit none integer :: numTerms, denom, i double precision :: x, temp, summ ! Read Angle in Radians, and numTerms read(*,*) x, numTerms ! Computing MacLaurin series denom = (2*numTerms)-1 !Gives denominator temp = x !Temp calculates each term summ = x do i = 3, denom, 2 temp = (((-temp)*(x**2))/(i*(i-1))) summ = summ + temp end do print *, summ end program SineApprox 

However, I'm not getting the same value that my prof requires for the input: 5 30

The output of my code is:

-0.95892427466314001 

But, the required output is:

-0.95892427466313568 ^^^^ 

I can't figure out where the error is.

0

4 Answers 4

4

I can't figure out where the error is.

A high precisions sine(5.0) is -0.95892427466313846889...

OP’s result is better than Prof’s.

OP's result is within 14 ULP of the best answer whereas Prof's is 25 ULP off.

So no problem on OP’s part. To get an exact match to the Prof's answer, you would have to code an inferior approach.

A simple possible reason for the Prof's inferior answer is if Prof's code looped only while i<=30 (15 terms, rather than 30) - which would just about explain the difference. Try using your code with fewer iterations and see what iteration count closest matches the Prof's answer.


// sine(5.0) Delta from correct Source // //-0.95892427466313846889... 0 000000 calc //-0.95892427466313823 -23 889... chux (via some C code) //-0.95892427466314001 +154 111... OP //-0.95892427466313568 -278 889... Prof // 0.00000000000000089 +/-89 ... ulp(5.0) // 0.00000000000000011 +/-11 ... ulp(-0.9589) // 12345678901234567(digit count) 

Notes:
Near x == 5.0, after about the 17th term, the temp term is so small to not significantly affect the result. So certainly enough terms are used.

With common FP representation, the unit in the last place of 5 is ~89e-17. The ULP of -0.9589 if is ~11e-17.

Sign up to request clarification or add additional context in comments.

Comments

3

I will emulate the two algorithms with some ArbitraryPrecisionFloat to illustrate how worse is the factorial solution numerically: I'll use Squeak Smalltalk here, but the language does not really matter, you could do it in Maple or Python, as long as you have some arbitrary precision library available...

The closest binary64 floating point number to the exact result of sin(5) is -0.9589242746631385.

We'll see how well the two algorithms approximate that ideal value for different precision (ranging from single precision 24 bits to long double precision 64 bits).

p := 24 to: 64. e1 := p collect: [:n | | x denom temp summ closest | closest := (5 asArbitraryPrecisionFloatNumBits: 100) sin asFloat. x := 5 asArbitraryPrecisionFloatNumBits: n. numTerms := 30. denom := (2*numTerms)-1. temp := x. summ := x. 3 to: denom by: 2 do: [:i | temp := (((0-temp)*(x**2))/(i*(i-1))). summ := summ + temp ]. (summ asFloat - closest) abs]. 

Then the factorial rewrite:

e2 := p collect: [:n | | x denom temp summ closest fact | closest := (5 asArbitraryPrecisionFloatNumBits: 100) sin asFloat. x := 5 asArbitraryPrecisionFloatNumBits: n. numTerms := 30. denom := (2*numTerms)-1. temp := x. summ := x. 3 to: denom by: 2 do: [:i | fact := ((1 to: i) collect: [:k | k asArbitraryPrecisionFloatNumBits: n]) product. temp := ((x ** i)*(-1 ** (i//2)))/fact. summ := summ + temp ]. (summ asFloat - closest) abs]. 

We can then plot the result in whatever language (here Matlab)

p=24:64; e1=[1.8854927952283163e-8 4.8657250339978475e-8 2.5848555629259806e-8 6.355841153382613e-8 3.953766758435506e-9 2.071757310151412e-8 2.0911216092045493e-9 6.941377472813315e-10 4.700154709880167e-10 9.269683909352011e-10 6.256184459374481e-11 3.1578795134379334e-10 2.4749646776456302e-11 3.202560439063973e-11 1.526812010155254e-11 8.378742144543594e-12 3.444688978504473e-12 6.105005390111273e-12 9.435785486289205e-13 7.617240171953199e-13 2.275957200481571e-14 1.6486811915683575e-13 2.275957200481571e-14 5.1181281435219717e-14 1.27675647831893e-14 1.2101430968414206e-14 1.2212453270876722e-15 2.7755575615628914e-15 5.551115123125783e-16 1.5543122344752192e-15 1.1102230246251565e-16 1.1102230246251565e-16 0.0 1.1102230246251565e-16 0.0 0.0 0.0 0.0 0.0 0.0 0.0]; e2=[9.725292443585332e-7 4.281799078631465e-7 2.721746682476933e-7 1.823107481646602e-7 9.336073392152144e-8 5.1925587718493205e-8 1.6992282803052206e-8 6.756442849642497e-9 5.1179199767048544e-9 3.0311525511805826e-9 1.2180066955025382e-9 6.155346232716852e-10 2.8668412088705963e-10 6.983780220792823e-11 6.476741365446514e-11 3.8914982347648674e-11 1.7473689162272876e-11 1.2084888645347291e-11 4.513389662008649e-12 1.7393864126802328e-12 1.273314786942592e-12 5.172529071728604e-13 2.5013324744804777e-13 1.6198153929281034e-13 6.894484982922222e-14 2.8754776337791554e-14 1.6542323066914832e-14 8.770761894538737e-15 4.773959005888173e-15 2.7755575615628914e-15 7.771561172376096e-16 3.3306690738754696e-16 3.3306690738754696e-16 1.1102230246251565e-16 1.1102230246251565e-16 0.0 0.0 0.0 0.0 0.0 0.0]; figure; semilogy(p,e1,p,e2,'-.'); legend('your algorithm','factorial algorithm'); xlabel('float precision'); ylabel('error') 

Your algorithm performs better: one magnitude of error lesser than the factorial variant:

enter image description here

The worse thing in factorial variant is that it depends on the intrinsic power function x**power. This function does not necessarily answer the nearest floating point to the exact result and may vary depending on the underlying mathematical library implementation. So asking for a bit identical result that depends not only on strict IEEE 754 compliance but also on an implementation defined accuracy is a really dumb thing - unless all students have exact same hardware and software - but even then what a lesson is it wrt. what every scientist should know about floating point?

3 Comments

Be aware that this result is just for x=5. It would be interesting to see this result for various x. This will show that for some x the factorial algorithm will win, while for other x the original OP algorithm will be more accurate. However, in average, for IEEE double precision the OP's accuracy is 4.2224E-15 while the factorials is 1.09253E-14.
Yes, that would be interesting to generalize. However MacLaurin is more often used near zero where convergence is good. For that effect, the argument can be reduced to -pi/4,pi/4 by using symetry and a formula like sin(5x) can be used to further reduce the argument. So we'd better study those small numbers...
And for such small numbers, there is no practical accuracy difference between the two algorithms (except that the original is shorter and more efficient than the factorial variant)
1

I ended up talking to a classmate who got the exact answer. He told me that he built a factorial function, then he suggested that I solve for the terms using an if else statement: if(odd), then add. else if(even), then subtract. So I followed his suggestion and ended up with the correct output.

For reference, these are the test cases of my prof:

5 30 -> -0.95892427466313568

4 100 -> -0.75680249530792754

This is my code:

! Sine value using MacLaurin series recursive function factorial(n) result (f) double precision :: f double precision, intent(in) :: n if (n <= 0) then f = 1 else f = n * factorial(n-1) end if end function factorial program SineApprox implicit none integer :: numTerms, i, oddeven double precision :: x, summ, factorial, odd, even, power ! Read Angle in Radians, and numTerms read(*,*) x, numTerms ! Computing MacLaurin series summ = 0 power = 1 do i = 1, numTerms, 1 oddeven = modulo(i,2) if(oddeven == 1) then odd = (x**power)/factorial(power) summ = summ + odd else if (oddeven == 0) then even = (x**(power))/factorial(power) summ = summ - even end if power = power + 2 end do print *, summ end program SineApprox 

5 Comments

OK, but writing using a factorial function is silly. You idea computing the denominator continuously was better in terms of efficiency.
There is absolutely no guaranty that a different rewrite leads to same float result. factorial(23) is not representable in IEEE 64bits double precision... Same for 5**23, it is not representable in double precision. There is no guaranty that the power function answer the nearest float to the exact result, so this solution is highly dependent on underlying libm. I much much prefer your original version.
Something to take into account. I think this solution is worse then your previous one because here from 22! onwards, you loose accuracy in your factorial as it cannot be represented correctly anymore. That is one of the reasons why your previous result is much better. (21!= 5.1090942171709440E+019 and 23!=2.5852016738884978E+022 the latter misses already one digit)
@kvantour note that 22! is the last representable in double precision (23! is not). same for 5^22 (still representable) and 5^23 (not representable). IMO the worst thing preventing reproducibility is the usage of power function: we don't have any guaranty of accuracy, it will vary from a libm implementation to another...
@Katrina Recursion is a poor way to calculate a factorial. Just do the multiplication in a loop.
1

In contrast to what was demonstrated in both Chux's and aka.nice their answers and mentioned in various comments, I believe we jumped conclusion by focusing only on a single computation, i.e. sin(5). Eventhough the OP's answer is closer to the true value, it must be said that both the OP's algorithm and the factorial algorithm are as bad for computing sin(5), but in the end, the factorial algorithm is the better one.

This answer will not go into to much detail about Floating-Point Arithmetic. An excelent monograph can be found on this topic in What Every Computer Scientist Should Know About Floating-Point Arithmetic. Neither will I open the can of worms of Intermediate Floating-Point Precision.

disclaimer: I don't want to claim that what I write here is 100% correct and I am for sure not an authority in this field. I just found the question extremely interesting and tried to get as much out of it as I could. I obviously welcome any comment!

some interesting reads that lead to this:

Some initial concepts

  • floating point representation (FPR): in a finite floating point representation, a number is written as ±0.d1d2 ... dp x be. It has a precision p and an even base b where all digits di are integers with 0 ≤ di < b. It is straightforward that such representation cannot represent all real numbers. For example, if b = 10 and p = 3, the number 1/3 is approximated as 0.333 x 100 or if b=2 and p=24 one has to approximate 0.1 as 0.110011001100110011001101 x 25.

  • absolute error (AE): if a real number x has an AE of Δx, then we know that the expected true value of x lays between x - Δx ≤ x̃ ≤ x + Δx. If x is the closest FPR to the true value , then we know that its AE is ( (b/2) b-p - 1 ) x be. For example, if b = 10 and p = 3, the number 1/3 is approximated as 0.333 x 100 and has an absolute error of 0.0005 x 100 indicating that 0.3325 x 100 ≤ 1/3 ≤ 0.3335 x 100

What does this all mean for the Taylor-MacLaurin series?

For a standard IEEE binary64 (double precision) computation (b = 2 and p = 53) of the Taylor-MacLaurin series of sin(5) one quickly sees that the horse already bolted at the 2nd and 3rd term. Here, the FPR is only accurate upto 2^-49 as can be seen in the table below (representing nearest binary64 representation of the true fractions):

j term FPR e AE 0.12345678901234567 1 5^1/1! 5.00000000000000000E+00 3 2^-51 4.4E-16 3 -5^3/3! -2.08333333333333321E+01 5 2^-49 1.8E-15 5 5^5/5! 2.60416666666666679E+01 5 2^-49 1.8E-15 7 -5^7/7! -1.55009920634920633E+01 4 2^-50 8.9E-16 j sum FPR e AE 0.12345678901234567 1 5 5.00000000000000000E+00 3 2^-51 4.4E-16 3 -95/6 -1.58333333333333339E+01 4 2^-50 8.9E-16 5 245/24 1.02083333333333339E+01 4 2^-50 8.9E-16 7 -5335/1008 -5.29265873015873023E+00 3 2^-51 4.4E-16 

The accuracy upto 2^-49 can be understood in the following way. If you look at the term 5^3/3! then the nearest FPR of this number is the fraction (5864062014805333 / 9007199254740992) * 2^5. As you see, we are missing a part here namely

5^3/3! - 5864062014805333 / 9007199254740992) * 2^5 = 1/844424930131968 ~ 1.1842378929335002E-15 < 2^-49 

So no matter wat we try to do, we always miss this part. so it is not possible to compute sin(5) with an accuracy better then 2^-49. As this is the accuarcy of the biggest term (absolute value) that is added to the sum. However, other terms also introduce errors and all these errors accumulate. So after the first 30 terms, we know that the AE for sin(5) accumulate to be :

2^-51 + 2^-49 + 2^-49 + 2^50 + ... = 5.45594...E-15 

This number, is however too idealistic as there is more loss of precision due to loss of significance.

Which algorithm is better?

The two presented algorithms are (assuming all variables are IEEE binary64 (double precision) variables with exception of i):

  • algorithm 1: This is a slightly adopted version of the algorithm presented here

    fact = 1.0_dp; pow = x; sum = x; xx = x*x do i=2,30 j=2*i-1 fact = fact*j*(j-1) pow = -pow * xx term = pow/fact sum = sum + term end do 
  • algorithm 2: This is a slightly adopted version of the algorithm presented here

    term = x; sum = x; xx = x*x do i=2,30 j=2*i-1 term = (-term*xx)/(j*(j-1)) sum = sum + term end do 

While both algorithms are mathematically the same, there is a subtle numerical difference. In floating-point operations, divisions and multiplications are considered to be safe, that is they will always result in the nearest FPR of the true result. That is, with the given input. This, however, does not mean that multiple divisions and multiplication will lead to the nearest FPR of the full computation.

This is why algorithm 2 is slightly worse then algorithm 1. The computation of term = (-term*xx)/(j*(j-1)) contains multiple multiplications and divisions and already makes use of an approximated version of term. This in contrast to algorithm 1 where term = pow/fact is a single operation.

The table below shows the differences in term starting from j=5:

 j term(alg-1) term(alg-2) 0.12345678901234567 0.12345678901234567 1 5.00000000000000000E+00 5.00000000000000000E+00 3 -2.08333333333333321E+01 -2.08333333333333321E+01 5 2.60416666666666679E+01 2.60416666666666643E+01 7 -1.55009920634920633E+01 -1.55009920634920633E+01 9 5.38228891093474449E+00 5.38228891093474360E+00 11 -1.22324747975789649E+00 -1.22324747975789627E+00 13 1.96033249961201361E-01 1.96033249961201306E-01 15 -2.33372916620477808E-02 -2.33372916620477738E-02 17 2.14497166011468543E-03 2.14497166011468499E-03 

The size of the created error is of the magnitude :

AE(term(j-2)) * xx / (j*(j-1)) 

Can we still make improvements?

The answer is clearly a yes, but we need to use some 'tricks'. The easiest way would be to use a higher-precision and then convert everything to double-precision, but this is cheating!

Earlier it was mentioned that loss of significance will introduce more errors and in this comment it was mentioned from 23! onwards, you loose accuracy in your factorial as it cannot be represented correctly anymore as a binary64 number.

We will try to improve the answer, by keeping track of an error term and making use of what is known as compensated summation or Kahan summation. As was indicated early in the beginning of this post, it is the inverse factorial that essentially leads to loss of precision.

computing the inverse factorial: instead of computing the factorial we will compute the inverse of the factorial in the following way. Imagine we computed f ~ 1/(n-1)! with corresponding error-term q such that f + q ~ 1/(n-1)! . We know that the 'FPR' of f can be written in the following way :f = a * b e-p with a, b, e, p integers. So it is possible to use integer division to compute the 1/n! and its corresponding error term using the following set of operations:

f = (a/n) * b^(e-p) q = (q + mod(a,n) * b^(e-p))/n 

in Fortran this leads to the following code :

f = 1.0_dp; q = 0.0_dp do i = 2,10 ! retrieving the integer from 1/(i-1)! a = int(scale(fraction(f),digits(f)),kind=INT64) ! computing the error on f while keeping track of the ! previous error q = (q + scale(real(mod(a,i),kind=dp),exponent(f)-digits(f)))/i ! computing f/i resembling 1/i! f = scale(real(a/i ,kind=dp),exponent(f)-digits(f)) ! rebalancing the terms t = f; f = f + q; q = q - (f - t) end do 

Here, the last line rebalances f with its error q. A trick stemming from the Kahan summation.

Kahan summation with inverse factorial ... the new sin:

It is now possible to combine this trick with algorithm 2 to make it a bit better :

pow = x; sum = x; xx = x*x; err = 0.0_dp; f = 1.0_dp; q = 0.0_dp do i=2,30 j=2*i-1 ! The factorial part a = int(scale(fraction(f),digits(f)),kind=INT64) q = (q + scale(real(mod(a,j*(j-1)),kind=dp),exponent(f)-digits(f)))/(j*(j-1)) f = scale(real(a/(j*(j-1)) ,kind=dp),exponent(f)-digits(f)) t = f; f = f + q; q = q - (f - t) pow = -pow*xx ! computes x^j ! Kahan summation t = pow*f; err = err + (t - ((sum + t) - sum)); sum = sum + t t = pow*q; err = err + (t - ((sum + t) - sum)); sum = sum + t t = sum; sum = sum + err; err = err - (sum - t) end do 

This algorithm leads to remarkably good results. An excellent accuracy in the quadrant 1 and 2 and slightly worse in the third quadrant. The fourth quadrant is still okay. It has however, horrible accuracy near Pi and 2Pi.

7 Comments

Well, these are just very small effects. My biggest objection to the factorial as presented was that computing the whole factorial from beginning in every iteration is simply nonsense from the efficiency point of view when we have the last factorial already. And both of your algorithms do that.
@VladimirF I was not aiming for speed but just for accuracy. So the idea was to present where some of the numerical errors sneak in and how you can minimize them in a very ugly way. The final presented code does not recompute the full factorial over and over.
A good method for both speed and accuracy en.wikipedia.org/wiki/Horner%27s_method
@kvantour I know it does not compute it over and over. Alas, I even tried to stress that I know it in my previous comment.
polynomial x - x3*(1/6.-x2*(1/120.-.... even better if using trig identities to reduce range to e.g. -pi/4<x<pi/4 and some method to optimize coefficients e.g. Chebyshev economization. Maybe a point of the exercise is to show the limitations of a series without range reduction.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.