4

I am attempting to speed up a Python script by using ctypes to outsource some of the heavily lifting to C++.

I have this up and running quite nicely with a small example (returning x^2) but now it is time to set up my function in this configuration.

My question is, how would one write this function of Python code nicely in C++ to ensure its as quick as possible, I would hate to think I might not get any speed increase, simply because of my sub-par C++.

def shortTermEnergy(frame): return sum( [ abs(x)**2 for x in frame ] ) / len(frame) 

I will be passing frame as an array by using arr = (ctypes.c_int * len(frame))(*frame) converting it from a list to a nice array for C++

I hope this is the best practice and I am not missing anything glaringly obvious? It's been a long time since I wrote any C++.

Thanks

EDIT

I have gone with this C++ code for the moment, please do let me know if there are ways to improve.

#include <cmath> extern "C" int square(int size, int array[]) { int sum = 0; for (int i = 0; i < size; i++) { int number = array[i]; int value = (number * number); sum = sum + value; } return floor(sum / size); } 

Where size is the len() of the array passed from Python.

8
  • Implementations depends how frame is represented in your c++ code (native array / stl vector) - not sure that the efficiency will vary that much from one option to another, but there are different (shorter / longer) ways of implementing it depending on frame's type. Commented Jun 24, 2015 at 9:56
  • It would be a native array I think (I assume that would be best) Commented Jun 24, 2015 at 9:57
  • 1
    why do you use abs(x)? isn't squaring x sufficient to have a positive value? Commented Jun 24, 2015 at 10:59
  • Very good point, I will remove that. I am not the original author of that function and overlooked it. Commented Jun 24, 2015 at 11:08
  • 1
    A suggestion: I believe python module numpy was created to speed up math operations in python, in your place a would give it a try before rewriting things in C++ Commented Jun 24, 2015 at 11:17

2 Answers 2

2

I would go with this:

template<class MeanT, class AccumT = MeanT, class IterT> MeanT mean_squares(IterT start, IterT end) { AccumT accum = 0; for (IterT it = start; it != end; ++it) { accum += *it * *it; } return accum / (end - start); } 

I left out the abs since it's not necessary. But it could be that the compiler is able to optimise unsigned multiplication better.

Using is like this:

double result = mean_squares<double, unsigned long>(array, array + length); // std::begin(vect), std::end(vect) in case of an STL vector 

I hope this helps.

concerning your code: it's probably OK, though I would make the sum and i unsigned. You can add const to the array parameter type, but the compiler most certainly is able to figure that out on its own. Oh, and I think you should remove that floor. Integer division does that already.

Sign up to request clarification or add additional context in comments.

2 Comments

I tried several approaches including the op's (just out of curiosity), and this one was by far the fastest. according to relative calculations I've made, the op's code took on my machine about 160-165 µs overall for 10,000 elements, and this example takes about 50! I'll be glad if anyone can explain the huge differences compared to the not so different implementations (after template instanciation).
Hi daniel, I will look into using your above function. I did take your comments on my own code on board, all of which worked nicely, except making the sum unsigned , this actually gave me different outputs strangely.
2

Sorry for not answering your question explicitly, but I think a numpy solution would be a lot easier to realise and can improve the speed almost as good as a C++ snippet:

import numpy as np frame = np.random.random_sample(10000) def shortTermEnergy(frame): return sum( [ abs(x)**2 for x in frame ] ) / len(frame) >> %timeit shortTermEnergy(frame) >> 100 loops, best of 3: 4.11 ms per loop def dot_product(frame): return np.dot(frame, frame)/frame.size >> %timeit dot_product(frame): >> 10000 loops, best of 3: 19.3 µs per loop 

1 Comment

Hey dlenz, this actually made a massive improvement, which is great for such a simple change. C++ still outperformed this method as expected, but it's good to know anyway, cheers

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.