I need to optimize a function call that is in a loop, for a time-critical robotics application. My script is in python, which interfaces via ctypes with a C++ library I wrote, which then calls a microcontroller library.
The bottleneck is adding position-velocity-time points to the microcontroller buffer. According to my timing checks, calling the C++ function via ctypes takes about 0.45 seconds and on the C++ side the called function takes 0.17 seconds. I'm need to reduce this difference somehow.
Here is the relevant python code, where data is a 2D array of points and clibrary is loaded via ctypes:
data_np = np.vstack([nodes, positions, velocities, times]).transpose().astype(np.long) data = ((c_long * 4) * N)() for i in range(N): data[i] = (c_long * 4)(*data_np[i]) timer = time() clibrary.addPvtAll(N, data) print("clibrary.addPvtAll() call: %f" % (time() - timer)) And here is the called C++ function:
void addPvtAll(int N, long data[][4]) { clock_t t0, t1; t0 = clock(); for(int i = 0; i < N; i++) { unsigned short node = (unsigned short)data[i][0]; long p = data[i][1]; long v = data[i][2]; unsigned char t = (unsigned char)data[i][3]; VCS_AddPvtValueToIpmBuffer(device(node), node, p, v, t, &errorCode); } t1 = clock(); printf("addPvtAll() call: %f \n", (double(t1 - t0) / CLOCKS_PER_SEC)); } I don't absolutely need to use ctypes but I don't want to have to compile the Python code every time I run it.