Python: why partition(sep) is faster than split(sep, maxsplit=1)

Question

I found an interesting thing that partition is faster than split when get whole substring after the separator. I have tested in Python 3.5 and 3.6 (Cpython)

In [1]: s = 'validate_field_name' In [2]: s.partition('_')[-1] Out[2]: 'field_name' In [3]: s.split('_', maxsplit=1)[-1] Out[3]: 'field_name' In [4]: %timeit s.partition('_')[-1] 220 ns ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit s.split('_', maxsplit=1)[-1] 745 ns ± 48.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [6]: %timeit s[s.find('_')+1:] 340 ns ± 1.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I look through the Cpython source code and found the partition use the FASTSEARCH algorithm, see here. And the split only use FASTSEARCH when the separator string's length is larger than 1, see here. But I have tested on sep string which length is larger. I got same result.

I guess the reason is partition return a three elements tuple, instead of a list.

I want to know more details.

Yes, part of the reason is that building a fixed length tuple is more efficient than building an arbitrary length list. — PM 2Ring
– PM 2Ring, Commented Dec 20, 2017 at 3:56
You are also calling split with a keyword argument, i.e. s.split('_', maxsplit=1) instead of a simple s.split('_', 1). — Cristian Ciupitu
– Cristian Ciupitu, Commented Sep 29, 2019 at 14:00

C. Yduqoli · Accepted Answer · 2017-12-20 10:00:56Z

Microbenchmarks can be misleading

py -m timeit "'validate_field_name'.split('_', maxsplit=1)[-1]" 1000000 loops, best of 3: 0.568 usec per loop py -m timeit "'validate_field_name'.split('_', 1)[-1]" 1000000 loops, best of 3: 0.317 usec per loop

Just passing the argument as positional or keyword changes the time significantly. So I would guess another reason partition is faster, because it does not need a second argument...

It's amazing. I agree with what you said, but I test a simple function that do nothing and got following result. In [16]: def func(a, b): ...: pass ...: In [17]: %timeit func(1, 2) 95.8 ns ± 1.72 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [18]: %timeit func(1, b=2) 123 ns ± 2.3 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) The difference is not obvious. I think the builtin fucntion is special. It take some other optimization.

Collectives™ on Stack Overflow

Python: why partition(sep) is faster than split(sep, maxsplit=1)

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related