Implementing Table-Lookup-Based Trig Functions

Question

For a videogame I'm implementing in my spare time, I've tried implementing my own versions of sinf(), cosf(), and atan2f(), using lookup tables. The intent is to have implementations that are faster, although with less accuracy.

My initial implementation is below. The functions work, and return good approximate values. The only problem is that they are slower than calling the standard sinf(), cosf(), and atan2f() functions.

So, what am I doing wrong?

// Geometry.h includes definitions of PI, TWO_PI, etc., as // well as the prototypes for the public functions #include "Geometry.h" namespace { // Number of entries in the sin/cos lookup table const int SinTableCount = 512; // Angle covered by each table entry const float SinTableDelta = TWO_PI / (float)SinTableCount; // Lookup table for Sin() results float SinTable[SinTableCount]; // This object initializes the contents of the SinTable array exactly once class SinTableInitializer { public: SinTableInitializer() { for (int i = 0; i < SinTableCount; ++i) { SinTable[i] = sinf((float)i * SinTableDelta); } } }; static SinTableInitializer sinTableInitializer; // Number of entries in the atan lookup table const int AtanTableCount = 512; // Interval covered by each Atan table entry const float AtanTableDelta = 1.0f / (float)AtanTableCount; // Lookup table for Atan() results float AtanTable[AtanTableCount]; // This object initializes the contents of the AtanTable array exactly once class AtanTableInitializer { public: AtanTableInitializer() { for (int i = 0; i < AtanTableCount; ++i) { AtanTable[i] = atanf((float)i * AtanTableDelta); } } }; static AtanTableInitializer atanTableInitializer; // Lookup result in table. // Preconditions: y > 0, x > 0, y < x static float AtanLookup2(float y, float x) { assert(y > 0.0f); assert(x > 0.0f); assert(y < x); const float ratio = y / x; const int index = (int)(ratio / AtanTableDelta); return AtanTable[index]; } } float Sin(float angle) { // If angle is negative, reflect around X-axis and negate result bool mustNegateResult = false; if (angle < 0.0f) { mustNegateResult = true; angle = -angle; } // Normalize angle so that it is in the interval (0.0, PI) while (angle >= TWO_PI) { angle -= TWO_PI; } const int index = (int)(angle / SinTableDelta); const float result = SinTable[index]; return mustNegateResult? (-result) : result; } float Cos(float angle) { return Sin(angle + PI_2); } float Atan2(float y, float x) { // Handle x == 0 or x == -0 // (See atan2(3) for specification of sign-bit handling.) if (x == 0.0f) { if (y > 0.0f) { return PI_2; } else if (y < 0.0f) { return -PI_2; } else if (signbit(x)) { return signbit(y)? -PI : PI; } else { return signbit(y)? -0.0f : 0.0f; } } // Handle y == 0, x != 0 if (y == 0.0f) { return (x > 0.0f)? 0.0f : PI; } // Handle y == x if (y == x) { return (x > 0.0f)? PI_4 : -(3.0f * PI_4); } // Handle y == -x if (y == -x) { return (x > 0.0f)? -PI_4 : (3.0f * PI_4); } // For other cases, determine quadrant and do appropriate lookup and calculation bool right = (x > 0.0f); bool top = (y > 0.0f); if (right && top) { // First quadrant if (y < x) { return AtanLookup2(y, x); } else { return PI_2 - AtanLookup2(x, y); } } else if (!right && top) { // Second quadrant const float posx = fabsf(x); if (y < posx) { return PI - AtanLookup2(y, posx); } else { return PI_2 + AtanLookup2(posx, y); } } else if (!right && !top) { // Third quadrant const float posx = fabsf(x); const float posy = fabsf(y); if (posy < posx) { return -PI + AtanLookup2(posy, posx); } else { return -PI_2 - AtanLookup2(posx, posy); } } else { // right && !top // Fourth quadrant const float posy = fabsf(y); if (posy < x) { return -AtanLookup2(posy, x); } else { return -PI_2 + AtanLookup2(x, posy); } } return 0.0f; }

well, presumably the inbuilt ones are hand optimised, and prob. as fast as is possible! — Mitch Wheat
– Mitch Wheat, Commented Mar 16, 2009 at 15:08
have you actually determined that the inbuilt ones are a bottleneck?? — Mitch Wheat
– Mitch Wheat, Commented Mar 16, 2009 at 15:09

fbonnet · Accepted Answer · 2009-03-16 15:12:20Z

"Premature optimization is the root of all evil" - Donald Knuth

Nowadays compilers provide very efficient intrinsics for trigonometric functions that get the best from modern processors (SSE etc.), which explains why you can hardly beat the built-in functions. Don't lose too much time on these parts and instead concentrate on the real bottlenecks that you can spot with a profiler.

Quote: Wrongly attributed to Knuth. Hoare seems to be the originator though Hoare disclaims it according to the Wikipedia article -- en.wikipedia.org/wiki/C._A._R._Hoare
Knuth was the first one to put it in print (in his seminal "Structured Programming with GOTOs"). Quoting Wikipedia as a source is like saying, "some random guy on teh interwebs says...."

Rich · Accepted Answer · 2009-03-16 16:04:57Z

Remember you have a co-processor ... you would have seen an increase in speed if it were 1993 ... however today you will struggle to beat native intrinsics.

Try viewing the disassebly to sinf.

John Rasch · Accepted Answer · 2009-03-16 15:15:08Z

Someone has already benchmarked this, and it looks as though the Trig.Math functions are already optimized, and will be faster than any lookup table you can come up with:

http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html

(They didn't use anchors on the page so you have to scroll about 1/3 of the way down)

bayda · Accepted Answer · 2009-03-16 15:11:34Z

I'm worried by this place:

// Normalize angle so that it is in the interval (0.0, PI) while (angle >= TWO_PI) { angle -= TWO_PI; }

But you can: Add time-meters to all functions, write special performance tests, run performance tests, print report of time test.. I think you will know answer after this tests.

Also you could use some profiling tools such as AQTime.

Eric Petroelje · Accepted Answer · 2009-03-17 12:14:16Z

The built-in functions are very well optimized already, so it's going to be REALLY tough to beat them. Personally, I'd look elsewhere for places to gain performance.

That said, one optimization I can see in your code:

// Normalize angle so that it is in the interval (0.0, PI) while (angle >= TWO_PI) { angle -= TWO_PI; }

Could be replaced with:

angle = fmod(angle, TWO_PI);

Collectives™ on Stack Overflow

Implementing Table-Lookup-Based Trig Functions

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

Comments

Comments

Linked

Related