Why is the float * int multiplication faster than int * float in CPython?

Question

Basically, the expression 0.4 * a is consistently, and surprisingly, significantly faster than a * 0.4. a being an integer. And I have no idea why.

I speculated that it is a case of a LOAD_CONST LOAD_FAST bytecode pair being "more specialized" than the LOAD_FAST LOAD_CONST and I would be entirely satisfied with this explanation, except that this quirk seems to apply only to multiplications where types of multiplied variables differ. (By the way, I can no longer find the link to this "bytecode instruction pair popularity ranking" I once found on github, does anyone have a link?)

Anyway, here are the micro benchmarks:

$ python3.10 -m pyperf timeit -s"a = 9" "a * 0.4" Mean +- std dev: 34.2 ns +- 0.2 ns

$ python3.10 -m pyperf timeit -s"a = 9" "0.4 * a" Mean +- std dev: 30.8 ns +- 0.1 ns

$ python3.10 -m pyperf timeit -s"a = 0.4" "a * 9" Mean +- std dev: 30.3 ns +- 0.3 ns

$ python3.10 -m pyperf timeit -s"a = 0.4" "9 * a" Mean +- std dev: 33.6 ns +- 0.3 ns

As you can see - in the runs where the float comes first (2nd and 3rd) - it is faster.
So my question is where does this behavior come from? I'm 90% sure that it is an implementation detail of CPython, but I'm not that familiar with low level instructions to state that for sure.

My guess: float.__add__ immediately converts the integer to a float, where as int.__add__ raises NotImplemented, forcing float.__radd__ to be called. — chepner
– chepner, Commented Aug 11, 2022 at 20:54

Tim Peters · Accepted Answer · 2022-08-11 20:59:16Z

It's CPython's implementation of the BINARY_MULTIPLY opcode. It has no idea what the types are at compile-time, so everything has to be figured out at run-time. Regardless of what a and b may be, BINARY_MULTIPLY ends up inoking a.__mul__(b).

When a is of int type int.__mul__(a, b) has no idea what to do unless b is also of int type. It returns Py_RETURN_NOTIMPLEMENTED (an internal C constant). This is in longobject.c's CHECK_BINOP macro. The interpreter sess that, and effectively says "OK, a.__mul__ has no idea what to do, so let's give b.__rmul__ a shot at it". None of that is free - it all takes time.

float.__mul__(b, a) (same as float.__rmul__) does know what to do with an int (converts it to float first), so that succeeds.

But when a is of float type to begin with, we go to float.__mul__ first, and that's the end of it. No time burned figuring out that the int type doesn't know what to do.

The actual code is quite a bit more involved than the above pretends, but that's the gist of it.

Collectives™ on Stack Overflow

Why is the float * int multiplication faster than int * float in CPython?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related