8

I have a set (at least 3) of curves (xy-data). For each curve the parameters E and T are constant but different. I'm searching the coefficients a,n and m for the best fit over all curves.

y= x/E + (a/n+1)*T^(n+1)*x^m 

I tried curve_fit, but I have no idea how to get the parameters E and T into the function f (see curve_fit documentation). Furthermore I'm not sure if I understand xdata correctly. Doc says: An M-length sequence or an (k,M)-shaped array for functions with k predictors. What's a predictor? As ydata has only one dimension I obviously can't feed multiple curves into the routine.

So curve_fit might be the wrong approach but I don't even know the magic words to search for the right one. I can't be the first one dealing with this problem.

2
  • Are x and y the same or different for each curve? Commented Oct 7, 2014 at 18:44
  • All curves have been measured in the same x interval. Although the original x-values are not identical I could create a set of common x-values for all curves. y-values are all different. Think of them as stacked in y-direction. Commented Oct 7, 2014 at 19:32

3 Answers 3

6

One way to do this is use scipy.optimize.leastsq instead (curve_fit is a convenience wrapper around leastsq).

Stack the x data in one dimension; ditto for the y data. The lengths of the 3 individual datasets don't even matter; let's call them n1, n2 and n3, so your new x and y will have a shape (n1+n2+n3,).

Inside the function to optimize, you can split up the data at your convenience. It will not be the nicest function, but this could work:

def function(x, E, T, a, n, m): return x/E + (a/n+1)*T^(n+1)*x^m def leastsq_function(params, *args): a = params[0] n = params[1] m = params[2] x = args[0] y = args[1] E = args[2] T = args[3] n1, n2 = args[2] yfit = np.empty(x.shape) yfit[:n1] = function(x[:n1], E[0], T[0], a, n, m) yfit[n1:n2] = function(x[n1:n2], E[1], T[1], a, n, m) yfit[n2:] = function(x[n2:], E[2], T[2], a, n, m) return y - yfit params0 = [a0, n0, m0] args = (x, y, (E0, E1, E2), (T0, T1, T2), (n1, n1+n2)) result = scipy.optimize.leastsq(leastsq_function, params0, args=args) 

I have not tested this, but this is the principle. You're now splitting up the data into 3 different calls inside the function that is to be optimized.

Note that scipy.optimize.leastsq simply requires a function that returns whatever value you'd like to be minized, in this case the difference between your actual y data and the fitted function data. The actual important variables in leastsq are the parameters you want to fit for, not the x and y data. The latter are passed as extra arguments, together with the sizes of three separate datasets (I'm not using n3, and I've done some juggling with the n1+n2 for convenience; keep in mind that the n1 and n2 inside leastsq_function are local variables, not the original ones).

Since this is an awkward function to fit (it probably won't have a smooth derivative, for example), it is quite essential to

  • provide good starting values (params0, so all the ...0 values).

  • don't have data or parameters which span orders of magnitude. The closer everything is around 1 (a few orders of magnitude is certainly ok), the better.

Sign up to request clarification or add additional context in comments.

Comments

4

thanks Evert for the reply.

Exactly what I needed to know!!

I have simplyfied the function as far as possible, as you suggested. However, the task was to find ONE set of A,m,n to fit all curves. So my code look like this:

import numpy import math from scipy.optimize import leastsq #+++++++++++++++++++++++++++++++++++++++++++++ def fit(x,T,A,n,m): return A/(n+1.0)*math.pow(T,(n+1.0))*numpy.power(x,m) #+++++++++++++++++++++++++++++++++++++++++++++ def leastsq_func(params, *args): cc=args[0] #number of curves incs=args[1] #number of points x=args[2] y=args[3] T=args[4:] A=params[0] n=params[1] m=params[2] yfit=numpy.empty(x.shape) for i in range(cc): v=i*incs b=(i+1)*incs if b<cc: yfit[v:b]=fit(x[v:b],T[i],A,n,m) else: yfit[v:]=fit(x[v:],T[i],A,n,m) return y-yfit #+++++++++++++++++++++++++++++++++++++++++++++ Ts =[10,100,1000,10000] #4 T-values for 4 curves incs=10 #10 datapoints in each curve x=["measured data"] #all 40 x-values y=["measrued data"] #all 40 y-values x=numpy.array(x) y=numpy.array(y) params0=[0.001,1.01,-0.8] #parameter guess args=[len(Ts),incs,x,y] for c in Ts: args.append(c) args=tuple(args) #doesn't work if args is a list!! result=leastsq(leastsq_func, params0, args=args) 

Works like clockwork.

At first I put the Ts in the params0 list and they were modified during iteration leading to nonsense results. Obvious, if you think about it. Afterwards ;-)

So, Vielen Dank! J.

1 Comment

ah, yes, sorry, read that just the wrong way 'round: I had E and T common for all datasets, and a, m & n different per dataset. Just too quick reading on my side of the question. I'll update my answer in due time, before I sow confusion among future readers. Of course, the principle is the same, which shows in your answer.
2

Thank you guys, I found this very useful. In case someone wants a general solution to this problem, I wrote one that is heavily inspired by the snippets above:

import numpy as np from scipy.optimize import leastsq def multiple_reg(x, y, f, const, params0, **kwargs): """Do same non-linear regression on multiple curves """ def leastsq_func(params, *args): x, y = args[:2] const = args[2:] yfit = [] for i in range(len(x)): yfit = np.append(yfit, f(x[i],*const[i],*params)) return y-yfit # turn const into 2d-array if 1d is given const = np.asarray(const) if len(const.shape) < 2: const = np.atleast_2d(const).T # ensure that y is flat and x is nested if hasattr(y[0], "__len__"): y = [item for sublist in y for item in sublist] if not hasattr(x[0], "__len__"): x = np.tile(x, (len(const), 1)) x_ = [item for sublist in x for item in sublist] assert len(x_) == len(y) # collect all arguments in a tuple y = np.asarray(y) args=[x,y] + list(const) args=tuple(args) #doesn't work if args is a list!! return leastsq(leastsq_func, params0, args=args, **kwargs) 

This function accepts xs and ys of various length, as they are stored in nested lists rather than numpy ndarrays. For the particular case presented in this thread, the function can be used like this:

def fit(x,T,A,n,m): return A/(n+1.0)*np.power(T,(n+1.0))*np.power(x,m) # prepare dataset with some noise params0 = [0.001, 1.01, -0.8] Ts = [10, 50] x = np.linspace(10, 100, 100) y = np.empty((len(Ts), len(x))) for i in range(len(Ts)): y[i] = fit(x, Ts[i], *params) + np.random.uniform(0, 0.01, size=len(x)) # do regression opt_params, _ = multiple_reg(x, y, fit, Ts, params0) 

Verify regression by plotting the data and regression lines

import matplotlib.pyplot as plt for i in range(len(Ts)): plt.scatter(x, y[i], label=f"T={Ts[i]}") plt.plot(x, fit(x, Ts[i], *opt_params), '--k') plt.legend(loc='best') plt.show() 

Multiple regression

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.