10

To learn how to create C-extensions I've decided to just copy a built-in .c-file (in this case itertoolsmodule.c) and placed it in my package. I only changed the names inside the module from itertools to mypkg.

Then I compiled it (Windows 10, MSVC Community 14) as setuptools.Extension:

from setuptools import setup, Extension itertools_module = Extension('mypkg.itertoolscopy', sources=['src/itertoolsmodulecopy.c']) setup(... ext_modules=[itertools_module]) 

The default uses the compiler flags /c /nologo /Ox /W3 /GL /DNDEBUG /MD and I read somewhere that these defaults equals the settings of how the python was compiled. However I use conda (64bit setup) so this might not necessarily be true.

It all went well - but a benchmark for filterfalse showed that it's almost a factor 2 slower than the built-in:

import mypkg import itertools import random a = [random.random() for _ in range(500000)] func = None %timeit list(filter(func, a)) 100 loops, best of 3: 3.42 ms per loop %timeit list(itertools.filterfalse(func, a)) 100 loops, best of 3: 3.41 ms per loop %timeit list(mypkg.filterfalse(func, a)) 100 loops, best of 3: 6.77 ms per loop 

However, for smaller iterables the discrepancy also becomes smaller:

a = [random.random() for _ in range(500)] # 1 / 1000 of the elements %timeit list(filter(func, a)) 100000 loops, best of 3: 9.66 µs per loop %timeit list(itertools.filterfalse(func, a)) 100000 loops, best of 3: 10.8 µs per loop %timeit list(mypkg.filterfalse(func, a)) 100000 loops, best of 3: 14.4 µs per loop 

I wasn't able to explain this difference in speed but I have to admit that I'm not too familiar with compiling C-code. I'm at a loss what actually makes it slower.

The results are the same on python 2.7 with ifilter and ifilterfalse and the 2.7 version of the itertoolsmodule.c file.

Does anyone knows what makes the code perform worse than the built-ins and how one could speed it up?

6
  • Since I'm trying to reproduce your results, what version of python are you targeting and on what platform (x86 or x86_64)? Commented Dec 17, 2016 at 2:21
  • The timings were done on 64bit py35 and 64bit py27 (both conda). Commented Dec 17, 2016 at 2:28
  • In python2.7 did you do list(itertools.ifilterfalse(...)), there is no itertools.filterfalse in python2, and ifilterfalse returns an iterator Commented Dec 17, 2016 at 2:29
  • @AnthonySottile Yes, as stated in the question "The results are the same on python 2.7 with ifilter and ifilterfalse and the 2.7 version of the itertoolsmodule.c file.". Commented Dec 17, 2016 at 2:32
  • @MSeifert Did you happen to find out anymore about this? I compared the built-in min function with a function from an extension module that uses identical code. 731 µs for the built-in min and 1.07 ms for the extension module min (for some input iterable). This is quite concerning for me. Commented May 6, 2022 at 9:23

1 Answer 1

6
+50

Curious about this problem myself I set out to attempt to reproduce the findings. Though the OP is on windows, it was slightly easier for me to attempt this on linux. I did eventually try it on windows but I'll walk you through what I did!

setup

I made a little test harness, it's a shell script but it makes it easier for someone else to try what I'm trying :D

test.sh

#!/usr/bin/env bash set -euxo pipefail rm -rf itertoolsmodule.c setup.py venv PYTHON=3.5 FUNCTION=filterfalse INIT=PyInit_ #PYTHON=2.7 #FUNCTION=ifilterfalse #INIT=init wget "https://raw.githubusercontent.com/python/cpython/$PYTHON/Modules/itertoolsmodule.c" sed -i "s/${INIT}itertools/${INIT}_myitertools/" itertoolsmodule.c sed -i 's/"itertools"/"_myitertools"/' itertoolsmodule.c cat > setup.py << EOF from setuptools import setup, Extension mod = Extension('_myitertools', ['itertoolsmodule.c']) setup(name='foo', ext_modules=[mod]) EOF virtualenv venv -ppython"$PYTHON" venv/bin/pip install . -v cat > test.py << EOF import _myitertools import itertools import random import time a = [random.random() for _ in range(500000)] iterations = range(10) seconds = 5 def builtins_filter(): for _ in iterations: list(filter(None, a)) _itertools_filterfalse = itertools.$FUNCTION def itertools_filterfalse(): for _ in iterations: list(_itertools_filterfalse(None, a)) _myitertools_filterfalse = _myitertools.$FUNCTION def myitertools_filterfalse(): for _ in iterations: list(_myitertools_filterfalse(None, a)) def runbench(func): start = time.time() end = start + seconds iterations = 0 while time.time() < end: func() iterations += 1 return iterations for func in (builtins_filter, itertools_filterfalse, myitertools_filterfalse): print('*' * 79) print(func.__name__) print('{} iterations in {} seconds'.format(runbench(func), seconds)) EOF 

ubuntu16.04 x86_64 python3.5.2 (stock, apt)

(I cut out the (imo) unimportant parts):

$ ./test.sh + rm -rf itertoolsmodule.c setup.py venv + PYTHON=3.5 + FUNCTION=filterfalse + INIT=PyInit_ ... + venv/bin/pip install . -v ... x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.5m -I/tmp/foo/venv/include/python3.5m -c itertoolsmodule.c -o build/temp.linux-x86_64-3.5/itertoolsmodule.o creating build/lib.linux-x86_64-3.5 x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.5/itertoolsmodule.o -o build/lib.linux-x86_64-3.5/_myitertools.cpython-35m-x86_64-linux-gnu.so ... + venv/bin/python test.py ******************************************************************************* builtins_filter 1401 iterations in 50 seconds ******************************************************************************* itertools_filterfalse 1977 iterations in 50 seconds ******************************************************************************* myitertools_filterfalse 1981 iterations in 50 seconds 

ubuntu16.04 x86_64 python2.7.12 (stock, apt)

+ rm -rf itertoolsmodule.c setup.py venv + PYTHON=2.7 + FUNCTION=ifilterfalse + INIT=init ... + venv/bin/pip install . -v ... x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c itertoolsmodule.c -o build/temp.linux-x86_64-2.7/itertoolsmodule.o creating build/lib.linux-x86_64-2.7 x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-Bsymbolic-functions -Wl,-z,relro -Wdate-time -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/itertoolsmodule.o -o build/lib.linux-x86_64-2.7/_myitertools.so ... + venv/bin/python test.py ******************************************************************************* builtins_filter 871 iterations in 50 seconds ******************************************************************************* itertools_filterfalse 1918 iterations in 50 seconds ******************************************************************************* myitertools_filterfalse 1863 iterations in 50 seconds 

Windows!

For windows, I changed the script slightly so it built virtualenvs using C:\Python##\python.exe (Using mysysgit so I have some amount of a unix toolset (bash, etc.)). Changing things from bin to Scripts (for virtualenv), etc. I don't have/use conda so these'll just be stock python on windows 10

windows 10 python 2.7.9 (stock, msi installer)

+ rm -rf itertoolsmodule.c setup.py venv + PYTHON=2.7 + FUNCTION=ifilterfalse + INIT=init ... + venv/Scripts/pip install . -v ... C:\Users\Anthony\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -IC:\Python27\include -Ic:\users\anthony\appdata\local\temp\foo\venv\PC /Tcitertoolsmodule.c /Fobuild\temp.win32-2.7\Release\itertoolsmodule.obj itertoolsmodule.c creating build\lib.win32-2.7 C:\Users\Anthony\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:C:\Python27\Libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\PCbuild /EXPORT:init_myitertools build\temp.win32-2.7\Release\itertoolsmodule.obj /OUT:build\lib.win32-2.7\_myitertools.pyd /IMPLIB:build\temp.win32-2.7\Release\_myitertools.lib /MANIFESTFILE:build\temp.win32-2.7\Release\_myitertools.pyd.manifest ... + venv/Scripts/python test.py ******************************************************************************* builtins_filter 914 iterations in 50 seconds ******************************************************************************* itertools_filterfalse 2352 iterations in 50 seconds ******************************************************************************* myitertools_filterfalse 2266 iterations in 50 seconds 

windows 10 python3.5.1 (stock, msi installer)

+ rm -rf itertoolsmodule.c setup.py venv + PYTHON=3.5 + FUNCTION=filterfalse + INIT=PyInit_ ... + venv/Scripts/pip install . -v ... D:\Programs\VS2015\VC\BIN\amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -IC:\Python35\include -IC:\Python35\include -ID:\Programs\VS2015\VC\INCLUDE -ID:\Programs\VS2015\VC\ATLMFC\INCLUDE "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\\winrt" /Tcitertoolsmodule.c /Fobuild\temp.win-amd64-3.5\Release\itertoolsmodule.obj itertoolsmodule.c creating C:\Temp\pip-1fnf27jo-build\build\lib.win-amd64-3.5 D:\Programs\VS2015\VC\BIN\amd64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Python35\Libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\libs /LIBPATH:c:\users\anthony\appdata\local\temp\foo\venv\PCbuild\amd64 /LIBPATH:D:\Programs\VS2015\VC\LIB\amd64 /LIBPATH:D:\Programs\VS2015\VC\ATLMFC\LIB\amd64 "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.10240.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\8.1\lib\winv6.3\um\x64" /EXPORT:PyInit__myitertools build\temp.win-amd64-3.5\Release\itertoolsmodule.obj /OUT:build\lib.win-amd64-3.5\_myitertools.cp35-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.5\Release\_myitertools.cp35-win_amd64.lib ... + venv/Scripts/python test.py ******************************************************************************* builtins_filter 658 iterations in 50 seconds ******************************************************************************* itertools_filterfalse 2601 iterations in 50 seconds ******************************************************************************* myitertools_filterfalse 2715 iterations in 50 seconds 

Conclusion

At the very least, my tests with stock python show that the extension module does not exhibit different performance characteristics.

wellp, I spent a half hour on this and didn't produce a reproduction. Hopefully this is helpful for the next poor soul who attempts this. I can only guess that conda is doing some additional optimization and then shipping a pyconfig.h file which lies about the flags used to compile. Though to be honest, I haven't yet ventured into the conda space so I don't know how their ecosystem works

Sign up to request clarification or add additional context in comments.

2 Comments

One slight comment that might help others reproducing my findings. My versions of python on windows were too old to handle the source on the 2.7 / 3.5 branches and I had to choose versions of the source before Py_SETREF were introduced.
Thank you for taking the time to dig through this. If the issue is not reproducible that would be very good news indeed! My appveyor tests seem to indicate that you're right. I also created an 32bit conda environment locally and I can't see any timeit-differences there neither. However it still buggers me why it's slower on my 64bit conda environment.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.