2

I have two large multidimensional arrays: Y carries three measurements of half a million objects (e.g. shape=(500000,3)) and X has same shape, but contains position of Y measurements.

At first, I would like for each row, containing an object, to fit a polynomial equation. I know that iterating over arrays are quite slow, but what I'm doing for the moment is:

fit = array([polyfit(X[i],Y[i],deg) for i in xrange(obs.shape[0])]) 

My question is: is there any possibility of fitting each row of both arrays without explicitly iterating over them?

3
  • Did you try something like apply along axis? Commented Mar 19, 2014 at 14:37
  • With 3 points you can only fit a linear or quadratic (exact fit) function. It is possible to calculate an explicit solution in both cases, which then can be vectorized to calculate for all rows at the same time. Commented Mar 20, 2014 at 2:58
  • Actually, three points is just for exemplification, when I really apply the method the dimension will be larger. Commented Mar 20, 2014 at 10:26

2 Answers 2

1

It is possible to do so without iterate along the first axis. However, your second axis is rather short (being just 3), you can really fit no more than 2 coefficients.

In [67]: import numpy as np import scipy.optimize as so In [68]: def MD_ployError(p, x, y): '''if x has the shape of (n,m), y must be (n,m), p must be (n*p, ), where p is degree''' #d is no. of degree p_rshp=p.reshape((x.shape[0], -1)) f=y*1. for i in range(p_rshp.shape[1]): f-=p_rshp[:,i][:,np.newaxis]*(x**i) return (f**2).sum() In [69]: X=np.random.random((100, 6)) Y=4+2*X+3*X*X P=(np.zeros((100,3))+[1,1,1]).ravel() In [70]: MD_ployError(P, X, Y) Out[70]: 11012.2067606684 In [71]: R=so.fmin_slsqp(MD_ployError, P, args=(X, Y)) Iteration limit exceeded (Exit mode 9) #you can increase iteration limit, but the result is already good enough. Current function value: 0.00243784856039 Iterations: 101 Function evaluations: 30590 Gradient evaluations: 101 In [72]: R.reshape((100, -1)) Out[72]: array([[ 3.94488512, 2.25402422, 2.74773571], [ 4.00474864, 1.97966551, 3.02010015], [ 3.99919559, 2.0032741 , 2.99753804], ..............................................) 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Zhu. However, is this method faster than my single line implementation?
It dependents. If your can provide a function to calculate the derivative of the error function it will be faster. If the initial guess parameters are 'good-guesses', it will be faster. A different optimizer may be faster. But overall I think it may be an overkill, especially if you don't set any constraints (say the coeff's for x^2 have to be equal). I have been in the same situation and I choose to multiprocessing (but I was fitting a very complex model to each row in that case). After all, the fitting for any one row is completely independent of the others.
0

Yes, if you use the new numpy polyfit from np.polynomial, not the old np.polyfit:

X = np.arange(3) Y = np.random.rand(10000, 3) fit = np.array([np.polyfit(X, y, 2) for y in Y]) fits = np.polynomial.polynomial.polyfit(X, Y.T, 2) assert np.allclose(fit.T[::-1], fits) 

Timing:

In [692]: timeit fit = np.array([np.polyfit(X, y, 2) for y in Y]) 1 loops, best of 3: 2.22 s per loop In [693]: timeit fits = np.polynomial.polynomial.polyfit(X, Y.T, 2) 100 loops, best of 3: 3.63 ms per loop 

3 Comments

I think the OP meant the X also has the shape of (10000,3) and the result need to be (10000,d) for d degrees (coefficients): fit each row of x by each row of y
Actually, the X array must be multidimensional too. As I comment above, I want more general solution, indeed. Dimension 3 was chosen just to formulate the question.
The dimensionality 3 is not the limit here, it's the shape of the arrays. That is, you have different x values for each row in y. It is common to have many sets of measurements (y) for one set of locations (x), but you have completely independent sets of measurements and locations. I don't think there is a simple way to fit them all at once using builtin polynomial fitting, you have to define your own error function as @CT did.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.