Optimizing the Pythagoras SQRT()

Question

After some latency measurement tests, I figured out I need to optimize an pythagoras triangle calculation done on an embedded CPU with a pretty slow FPU.

The problem is if these calculations occur, they come in numbers and this messes up the timing. I cannot reduce the absolute number of calculations. But somehow they need to get faster ... by at least factor 5. :-/

I'm currently thinking of pre-processing these calculations since the input range of distinct values is somehow limited to about 300-500 permutations and interpolation between two table entry should suffice. But I was also wondering if using some conditions to the problem it might be possible to also speed up this code:

float h = 0.f, v=0.f; /// ... float const d = std::sqrt( (h*h) + (v*v) );

This I haven't used yet:

The accuracy of result d is very limited to not more as 3 fractional digits are required
The legs of the triangle (h,v) are always at aspect ratio of 4:3 or 16:9

I don't know if some integer fixed point calculation are available fort square root or if the function can be substituted to a one with less accuracy or somehow using the aspect ratio.

Any ideas?

Thank you!

Could you perhaps use that at 4:3, the diagonal is long side*1.25 and at 16:9 it's about long side*1.147? — Joachim Isaksson
– Joachim Isaksson, Commented Aug 9, 2013 at 13:33
Asking about a micro-optimization, it helps to mention what the CPU is. -1 — Potatoswatter
– Potatoswatter, Commented Aug 9, 2013 at 13:35
In some situations you may be able to avoid taking the root at all. E.g. when interested in comparisons only. Finding the minimum length can be replaced with finding the minimum squared length. Possibly not useful in this case however. — user2656151
– user2656151, Commented Aug 9, 2013 at 13:50

Vaughn Cato · Accepted Answer · 2013-08-09 13:42:25Z

6

If you know the ratio is 16:9, you can do a little algebra:

h = 16*x v = 9*x x = h/16 sqrt((h*h) + (v*v)) = sqrt((16*16*x*x) + (9*9*x*x)) = sqrt((16*16+9*9)*x*x) = sqrt(16*16+9*9)*x = sqrt(16*16+9*9)*h/16 = sqrt(16*16+9*9)/16 * h

pre-compute sqrt(16*16+9*9)/16:

static float const multiplier = std::sqrt(16*16+9*9)/16.0;

and use

float const d = multiplier*h;

edited Aug 9, 2013 at 13:42

answered Aug 9, 2013 at 13:32

Vaughn Cato

64.5k5 gold badges87 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user529758 Over a year ago

Yap, this seems to be the solution.

Joachim Isaksson Over a year ago

So if h goes up, d goes down?

Vaughn Cato Over a year ago

@JoachimIsaksson: oops -- fixed

Hhut Over a year ago

Wow. This looks very good... I thought the aspect ratio might be usable, interesting. I will test this ASAP!

Michael Goldshteyn · Accepted Answer · 2013-08-09 13:29:20Z

I don't know how much RAM/ROM you have in this embedded system, but if you have some, pre-computing squares and square roots is probably the way to go. This assumes that your embedded CPU has the ability to cache and/or RAM/ROM access is faster than the floating point computations.

That said, there are numerical methods to calculate square root, but I am not sure that they would in practice be faster than the sqrt call, no matter how slow.

Collectives™ on Stack Overflow

Optimizing the Pythagoras SQRT()

2 Answers 2

4 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Related