0

I am trying to solve a relatively large ODE system with scipy.integrate.odeint module. I already implemented the code and I can solve the equation correctly. But the process is very slow. I profile the code and I realized that almost most of the computing time spent on computing or generating the ODE system itself, the sigmoid function is expensive too but I have to accept it I guess. Here is a piece of code that I'm using:

def __sigmoid(self, u): # return .5 * ( u / np.sqrt(u**2 + 1) + 1) return 0.5 + np.arctan(u) / np.pi def __connectionistModel(self, g, t): """ Returning the ODE system """ g_ia_s = np.zeros(self.nGenes * self.nCells) for i in xrange(0, self.nCells): for a in xrange(0, self.nGenes): g_ia = self.Params["R"][a] *\ self.__sigmoid( sum([ self.Params["W"][b + a*self.nGenes]*g[self.nGenes*i + b] for b in xrange(0, self.nGenes) ]) +\ self.Params["Wm"][a]*self.mData[i] +\ self.Params["h"][a] ) -\ self.Params["l"][a] * g[self.nGenes*i + a] # Uncomment this part for including the diffusion if i == 0: g_ia += self.Params["D"][a] * ( - 2*g[self.nGenes*i + a] + g[self.nGenes*(i+1) + a] ) elif i == self.nCells-1: g_ia += self.Params["D"][a] * ( g[self.nGenes*(i-1) + a] - 2*g[self.nGenes*i + a] ) else: g_ia += self.Params["D"][a] * ( g[self.nGenes*(i-1) + a] - 2*g[self.nGenes*i + a] + g[self.nGenes*(i+1) + a] ) g_ia_s[self.nGenes*i + a] = g_ia return g_ia_s def solve(self, inp): g0 = np.zeros(self.nGenes * self.nCells) t = np.arange(self.t0, self.t1, self.dt) self.integratedExpression = odeint(self.__connectionistModel, g0, t, full_output=0) return self.integratedExpression 

As you could see in each iteration, I should generate nCells*nGenes (100*3=300) equations and pass it to the odeint. Although I'm not sure but I guess generating the equations is very expensive in comparison to solving them. In my experiment, solving the whole system takes 7sec which consists of 1sec of odeint and 6secs of __ConnectionistModel.

I was wondering if there is a way that I could improve this or not? I tried to use SymPy to define a symbolic ODE system and pass the symbolic equations to the odeint but it didn't works properly since you couldn't really define an array of symbols which later you could access like an array.

In the worst case, I have to deal with it or use a Cython to speed up the solving process, but I wanted to make sure that I'm doing it right and there is no way to improve it.

Thanks for your help in advance.

[Update]: profiling result,

 ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 7.915 7.915 grnTest.py:1(<module>) 1 0.000 0.000 7.554 7.554 grn.py:83(solve) 1 0.000 0.000 7.554 7.554 odepack.py:18(odeint) 1 0.027 0.027 7.554 7.554 {scipy.integrate._odepack.odeint} 1597 5.506 0.003 7.527 0.005 grn.py:55(__connectionistModel) 479100 1.434 0.000 1.434 0.000 grn.py:48(__sigmoid) 479102 0.585 0.000 0.585 0.000 {sum} 1 0.001 0.001 0.358 0.358 grn.py:4(<module>) 2 0.001 0.001 0.207 0.104 __init__.py:10(<module>) 27 0.014 0.001 0.185 0.007 __init__.py:1(<module>) 7 0.006 0.001 0.106 0.015 __init__.py:2(<module>) 

[Update 2]: I made the code publicly available: pyStGRN

4
  • 1
    Profile your code better, be specific. Exactly where inside __connectionistModel you spend most time? Commented Jun 24, 2014 at 10:06
  • could you provide the inp input which gives this 7 second run? Commented Jun 24, 2014 at 10:39
  • @flebool I don't know how to profile inside a function to see which part is more expensive. But I included the profiling result. Commented Jun 24, 2014 at 11:45
  • I upload the code on GitHub - @will Commented Jun 24, 2014 at 14:01

1 Answer 1

4

Vectorize, vectorize, then vectorize some more. And use data structures that facilitate vectorization.

The function __connectionistModel uses a lot of the access pattern A[i*m+j], which is equivalent to an access to row i and column j in a 2D array with a total of m columns. This suggests that a 2D array is the right way to store the data. We can eliminate the loop over i from the function by using the NumPy slicing notation and vectorizing as follows:

def __connectionistModel_vec(self, g, t): """ Returning the ODE system """ g_ia_s = np.zeros(self.nGenes * self.nCells) g_2d = g.reshape((self.nCells, self.nGenes)) W = np.array(self.Params["W"]) mData = np.array(self.mData) g_ia_s = np.zeros((self.nCells, self.nGenes)) for a in xrange(0, self.nGenes): g_ia = self.Params["R"][a] *\ self.__sigmoid( (W[a*self.nGenes:(a+1)*self.nGenes]*g_2d).sum(axis=1) +\ self.Params["Wm"][a]*mData +\ self.Params["h"][a] ) -\ self.Params["l"][a] * g_2d[:,a] g_ia[0] += self.Params["D"][a] * ( - 2*g_2d[0,a] + g_2d[1,a] ) g_ia[-1] += self.Params["D"][a] * ( g_2d[-2,a] - 2*g_2d[-1,a] ) g_ia[1:-1] += self.Params["D"][a] * ( g_2d[:-2,a] - 2*g_2d[1:-1,a] + g_2d[2:,a] ) g_ia_s[:,a] = g_ia return g_ia_s.ravel() 

As far as I can see, this returns the same values as the original __connectionistModel. As a bonus, the function is now more compact. I only optimized this function so that it has the same inputs and outputs as the original, but for extra performance, you might want to organize your data in NumPy arrays instead of lists so as to avoid the conversion from lists to arrays on every call. And I'm sure there are other minor performance tweaks as well.

Anyway, the original code gave me these profiling results (insert mandatory "my computer is faster than yours" brag here):

ncalls tottime percall cumtime percall filename:lineno(function) 1597 3.648 0.002 5.250 0.003 grn.py:52(__connectionistModel) 479100 0.965 0.000 0.965 0.000 grn.py:48(__sigmoid) 479100 0.635 0.000 0.635 0.000 {sum} 1 0.017 0.017 5.267 5.267 {scipy.integrate._odepack.odeint} 1598 0.002 0.000 0.002 0.000 {numpy.core.multiarray.zeros} 

With __connectionistModel_vec, I get:

ncalls tottime percall cumtime percall filename:lineno(function) 1597 0.175 0.000 0.247 0.000 grn.py:79(__connectionistModel_vec) 4791 0.031 0.000 0.031 0.000 grn.py:48(__sigmoid) 4800 0.021 0.000 0.021 0.000 {method 'reduce' of 'numpy.ufunc' objects} 1 0.018 0.018 0.265 0.265 {scipy.integrate._odepack.odeint} 3197 0.013 0.000 0.013 0.000 {numpy.core.multiarray.array} 
Sign up to request clarification or add additional context in comments.

2 Comments

That's great! I knew that I should do something like that but I didn't know how to do it. I even tried to use np.vectorize to vectorize the function but it didn't work as I was expected. -- @Jsl
From the vectorize doc: The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.