Improve speed of passing data from Python to C(++) via ctypes

Question

I need to optimize a function call that is in a loop, for a time-critical robotics application. My script is in python, which interfaces via ctypes with a C++ library I wrote, which then calls a microcontroller library.

The bottleneck is adding position-velocity-time points to the microcontroller buffer. According to my timing checks, calling the C++ function via ctypes takes about 0.45 seconds and on the C++ side the called function takes 0.17 seconds. I'm need to reduce this difference somehow.

Here is the relevant python code, where data is a 2D array of points and clibrary is loaded via ctypes:

data_np = np.vstack([nodes, positions, velocities, times]).transpose().astype(np.long) data = ((c_long * 4) * N)() for i in range(N): data[i] = (c_long * 4)(*data_np[i]) timer = time() clibrary.addPvtAll(N, data) print("clibrary.addPvtAll() call: %f" % (time() - timer))

And here is the called C++ function:

void addPvtAll(int N, long data[][4]) { clock_t t0, t1; t0 = clock(); for(int i = 0; i < N; i++) { unsigned short node = (unsigned short)data[i][0]; long p = data[i][1]; long v = data[i][2]; unsigned char t = (unsigned char)data[i][3]; VCS_AddPvtValueToIpmBuffer(device(node), node, p, v, t, &errorCode); } t1 = clock(); printf("addPvtAll() call: %f \n", (double(t1 - t0) / CLOCKS_PER_SEC)); }

I don't absolutely need to use ctypes but I don't want to have to compile the Python code every time I run it.

Is the main program in C++, which calls the Python code, or in Python, which calls the C++ code. If the main program is in Python, you should call the C++ via an extension library, like the Python/C API, SWIG, PyCXX, or Boost.Python. You could also use Cython, which lets you call C/C++ code from Python. — kirbyfan64sos
– kirbyfan64sos, Commented Apr 21, 2013 at 21:29
The main program is in Python. However, I don't want to have to compile the Python every time I run it, and ideally won't have to rewrite the entire C++ library. Which one do you suggest looking into? — Hayk Martiros
– Hayk Martiros, Commented Apr 21, 2013 at 21:32
Boost.Python and SWIG involve writing wrappers, not rewriting the code. SWIG sometimes generates ugly code, while Boost.Python has trouble on 64-bit platforms. If you do use Boost.Python, create a 32-bit Linux virtual machine to run it in. SWIG automatically generates a wrapper using a provided interface file. PyCXX looks kind of odd, but promising. Your best bets are SWIG and Boost.Python. They both require no rewriting whatsoever. — kirbyfan64sos
– kirbyfan64sos, Commented Apr 21, 2013 at 21:43

Raymond Hettinger · Accepted Answer · 2013-04-21 22:39:00Z

The round-trip between Python and C++ can be expensive, especially when using ctypes (which is like an interpreted version of a normal C/Python wrapper).

Your goal should be to minimize the number of trips and do the most work possible per trip.

It looks to me like your code has too fine of a granularity (i.e. doing too many trips and doing too little work on each trip).

The numpy package can expose its data directly to C/C++. That will let you avoid the expensive boxing and unboxing of Python objects (with their attendant memory allocations) and it will let you pass a range of data points rather than a point at a time.

Modify your C++ code to process many points at a time rather than once per call (much like the sqlite3 module does with execute vs. executemany).

Hayk Martiros · Accepted Answer · 2013-04-22 21:03:33Z

Here is my solution, which effectively eliminates the measured time difference between Python and C. Credit to kirbyfan64sos for suggesting SWIG and Raymond Hettinger for C-arrays in numpy. I use a numpy array in Python which is sent to C purely as a pointer - the same memory block is accessed in both languages.

The C function remains identical except using gettimeofday() instead of clock(), which was giving inaccurate times:

void addPvtFrame(int pvt[6][4]) { timeval start,stop,result; gettimeofday(&start, NULL); for(int i = 0; i < 6; i++) { unsigned short node = (unsigned short)pvt[i][0]; long p = (long)pvt[i][1]; long v = (long)pvt[i][2]; unsigned char t = (unsigned char)pvt[i][3]; VCS_AddPvtValueToIpmBuffer(device(node), node, p, v, t, &errorCode); } gettimeofday(&stop, NULL); timersub(&start,&stop,&result); printf("Add PVT time in C code: %fs\n", -(result.tv_sec + result.tv_usec/1000000.0)); }

In addition, I installed SWIG and included the following in my interfaces file:

%include "numpy.i" %init %{ import_array(); %} %apply ( int INPLACE_ARRAY2[ANY][ANY] ) {(int pvt[6][4])}

Finally, my Python code constructs pvt as a contiguous array via numpy:

pvt = np.vstack([nodes, positions, velocities, times]) pvt = np.ascontiguousarray(pvt.transpose().astype(int)) timer = time() xjus.addPvtFrame(pvt) print("Add PVT time to C code: %fs" % (time() - timer))

The measured times now have about %1 difference on my machine.

Mark Tolonen · Accepted Answer · 2013-04-27 23:39:52Z

You can just use data_np.data.tobytes():

data_np = np.vstack([nodes, positions, velocities, times]).transpose().astype(np.long) timer = time() clibrary.addPvtAll(N, data_np.data.tobytes()) print("clibrary.addPvtAll() call: %f" % (time() - timer))

Collectives™ on Stack Overflow

Improve speed of passing data from Python to C(++) via ctypes

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related