Floating-point efficiency of Python library function implementations

Question

I've seen various answers to the ball collision detection question explaining why sqrt operations are slow, why absolute value operations are fast on floating ponts etc. How can I find out which operations are expensive and which are not?

Basically, I'm looking for a resource where I can learn about the implementation of all the python library functions. When people are saying those things they aren't actually thinking about how that stuff is implemented in C are they?

Thanks, looking to learn more about implementation details on this slick language.

I think you're asking specifically about floating-point functions/subroutines, not algorithmic efficiency (e.g. don't take sqrt when not necessary, just compare squared-distances directly), big-O or bytecode in general? "looking for a resource... about the implementation of all the python library functions" is way too general. Also, the bytecode compiler is getting improvements every version, as of 2024 the current release is 3.12; see the whatsnew. As to optimizations of floating-point esp. for 2D/3D games, there are tons of discussions on that since the 1970s, see those. — smci
– smci, Commented Mar 4, 2024 at 22:44
e.g. for collision detection, bounding-boxes in 3D were developed to use octrees as a data structure that can quickly be searched. There's about half a century of literature on that, going back to C and assembler. See also e.g. gamedev.stackexchange.com/questions/tagged/collision-detection . Always look first for high-level algorithmic and data-structure optimizations like octrees, not low-level stuff. — smci
– smci, Commented Mar 4, 2024 at 23:05
One legendary eighties compilation of efficient F-P algorithms for graphics is the "Graphics Gems" series of books, free PDFs and repo online. — smci
– smci, Commented Mar 5, 2024 at 0:06

Rodrigo Gama · Accepted Answer · 2010-07-09 21:12:14Z

3

Python is, like any language, translated to machine code and then run. So yes, they could be talking about low-level implementation details.

Anyway, the best way to learn about speed implementation of the various Python functions is taking a look in the source code.

Have fun!

EDIT: This tip actually applies to any open source project I worked with: Got a problem? Take a look at the source code and you should be fine.

edited Jul 9, 2010 at 21:12

answered Jul 9, 2010 at 21:04

Rodrigo Gama

1,1322 gold badges13 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

NickHalden Over a year ago

Ok so I downloaded the source code and unzipped it. Now I have a folder called Python-3.1.2 and it is confusing! Any tips on sifting through the source and finding what I'm really looking for? Thanks

Rodrigo Gama Over a year ago

I took a quick look on the source, and if my guesses are correct, the source file you should take a look at cmathmodule.c. There is a function called c_sqrt that does the job of finding the square root, and it is written in C. So, it really comes down to the implementation of sqrt in C after all.

Marius Gedminas · Accepted Answer · 2010-07-10 12:55:33Z

You find out which operations are slow and which ones are fast by using the timeit module.

For example, let's compare the various methods of checking if a point falls within a circle, from the command line:

python -m timeit -s 'import math; x = 42.5; y = 17.2; r = 50.0' 'math.sqrt(x**2 + y**2) <= r' python -m timeit -s 'import math; x = 42.5; y = 17.2; r = 50.0' 'math.hypot(x, y) <= r' python -m timeit -s 'import math; x = 42.5; y = 17.2; r = 50.0' 'x**2 + y**2 <= r**2'

On my machine the results are:

 $ python -m timeit -s 'import math; x = 42.5; y = 17.2; r = 50.0' 'math.sqrt(x**2 + y**2) <= r' 1000000 loops, best of 3: 0.744 usec per loop $ python -m timeit -s 'import math; x = 42.5; y = 17.2; r = 50.0' 'math.hypot(x, y) <= r' 1000000 loops, best of 3: 0.374 usec per loop $ python -m timeit -s 'import math; x = 42.5; y = 17.2; r = 50.0' 'x**2 + y**2 <= r**2' 1000000 loops, best of 3: 0.724 usec per loop

so math.hypot wins! Incidentally, if you remove the dotted name lookup from the inner loop, you get slightly better results:

$ python -m timeit -s 'from math import hypot; x = 42.5; y = 17.2; r = 50.0' 'hypot(x, y) <= r' 1000000 loops, best of 3: 0.334 usec per loop

Sean Woods · Accepted Answer · 2010-07-09 21:07:36Z

1

If you're looking for tips on Python speed in general, these performance tips are a good resource.

answered Jul 9, 2010 at 21:07

Sean Woods

2,5323 gold badges18 silver badges24 bronze badges

1 Comment

Jochen Ritzel Over a year ago

90% of the stuff in that article only applies to ~Python2.1 (that was 8 years ago) and is just plain false for todays Python implementation. It can show you how much Python improved, but that's about it.

Xiong Chiamiov · Accepted Answer · 2010-07-09 21:13:23Z

The Python standard library includes a disassembler:

>>> def foo(): print ... >>> import dis >>> dis.dis(foo) 1 0 PRINT_NEWLINE 1 LOAD_CONST 0 (None) 4 RETURN_VALUE

It doesn't work on everything, though:

>>> from math import sqrt >>> import dis >>> dis.dis(sqrt) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/dis.py", line 48, in dis type(x).__name__ TypeError: don't know how to disassemble builtin_function_or_method objects

Collectives™ on Stack Overflow

Floating-point efficiency of Python library function implementations

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

1 Comment

Comments

Related