Numpy Polyfit or any fitting to X and Y multidimensional arrays

Question

I have two large multidimensional arrays: Y carries three measurements of half a million objects (e.g. shape=(500000,3)) and X has same shape, but contains position of Y measurements.

At first, I would like for each row, containing an object, to fit a polynomial equation. I know that iterating over arrays are quite slow, but what I'm doing for the moment is:

fit = array([polyfit(X[i],Y[i],deg) for i in xrange(obs.shape[0])])

My question is: is there any possibility of fitting each row of both arrays without explicitly iterating over them?

With 3 points you can only fit a linear or quadratic (exact fit) function. It is possible to calculate an explicit solution in both cases, which then can be vectorized to calculate for all rows at the same time. — Josef
– Josef, Commented Mar 20, 2014 at 2:58
Actually, three points is just for exemplification, when I really apply the method the dimension will be larger. — phasselmann
– phasselmann, Commented Mar 20, 2014 at 10:26

CT Zhu · Accepted Answer · 2014-03-19 16:06:49Z

It is possible to do so without iterate along the first axis. However, your second axis is rather short (being just 3), you can really fit no more than 2 coefficients.

In [67]: import numpy as np import scipy.optimize as so In [68]: def MD_ployError(p, x, y): '''if x has the shape of (n,m), y must be (n,m), p must be (n*p, ), where p is degree''' #d is no. of degree p_rshp=p.reshape((x.shape[0], -1)) f=y*1. for i in range(p_rshp.shape[1]): f-=p_rshp[:,i][:,np.newaxis]*(x**i) return (f**2).sum() In [69]: X=np.random.random((100, 6)) Y=4+2*X+3*X*X P=(np.zeros((100,3))+[1,1,1]).ravel() In [70]: MD_ployError(P, X, Y) Out[70]: 11012.2067606684 In [71]: R=so.fmin_slsqp(MD_ployError, P, args=(X, Y)) Iteration limit exceeded (Exit mode 9) #you can increase iteration limit, but the result is already good enough. Current function value: 0.00243784856039 Iterations: 101 Function evaluations: 30590 Gradient evaluations: 101 In [72]: R.reshape((100, -1)) Out[72]: array([[ 3.94488512, 2.25402422, 2.74773571], [ 4.00474864, 1.97966551, 3.02010015], [ 3.99919559, 2.0032741 , 2.99753804], ..............................................)

Thanks Zhu. However, is this method faster than my single line implementation?
It dependents. If your can provide a function to calculate the derivative of the error function it will be faster. If the initial guess parameters are 'good-guesses', it will be faster. A different optimizer may be faster. But overall I think it may be an overkill, especially if you don't set any constraints (say the coeff's for x^2 have to be equal). I have been in the same situation and I choose to multiprocessing (but I was fitting a very complex model to each row in that case). After all, the fitting for any one row is completely independent of the others.

Community · Accepted Answer · 2017-05-23 12:20:52Z

0

Yes, if you use the new numpy polyfit from np.polynomial, not the old np.polyfit:

X = np.arange(3) Y = np.random.rand(10000, 3) fit = np.array([np.polyfit(X, y, 2) for y in Y]) fits = np.polynomial.polynomial.polyfit(X, Y.T, 2) assert np.allclose(fit.T[::-1], fits)

Timing:

In [692]: timeit fit = np.array([np.polyfit(X, y, 2) for y in Y]) 1 loops, best of 3: 2.22 s per loop In [693]: timeit fits = np.polynomial.polynomial.polyfit(X, Y.T, 2) 100 loops, best of 3: 3.63 ms per loop

edited May 23, 2017 at 12:20

CommunityBot

11 silver badge

answered Mar 19, 2014 at 16:09

askewchan

46.7k18 gold badges125 silver badges135 bronze badges

3 Comments

CT Zhu Over a year ago

I think the OP meant the X also has the shape of (10000,3) and the result need to be (10000,d) for d degrees (coefficients): fit each row of x by each row of y

phasselmann Over a year ago

Actually, the X array must be multidimensional too. As I comment above, I want more general solution, indeed. Dimension 3 was chosen just to formulate the question.

askewchan Over a year ago

The dimensionality 3 is not the limit here, it's the shape of the arrays. That is, you have different x values for each row in y. It is common to have many sets of measurements (y) for one set of locations (x), but you have completely independent sets of measurements and locations. I don't think there is a simple way to fit them all at once using builtin polynomial fitting, you have to define your own error function as @CT did.

Collectives™ on Stack Overflow

Numpy Polyfit or any fitting to X and Y multidimensional arrays

2 Answers 2

2 Comments

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Linked

Related