Return to Answer

emphasis on the breakpoint positions estimation of the method

edited Nov 27, 2018 at 14:25

The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimatedThe positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).

Source Link

answered Jun 8, 2018 at 13:44

xdze2

The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).

"the process is iterated until possible convergence, which is not, in general, guaranteed"

In particular, the convergence or the result may depends on the first estimation of the breakpoints.

This is the method used in the R Segmented package.

Here is an implementation in python:

import numpy as np from numpy.linalg import lstsq ramp = lambda u: np.maximum( u, 0 ) step = lambda u: ( u > 0 ).astype(float) def SegmentedLinearReg( X, Y, breakpoints ): nIterationMax = 10 breakpoints = np.sort( np.array(breakpoints) ) dt = np.min( np.diff(X) ) ones = np.ones_like(X) for i in range( nIterationMax ): # Linear regression: solve A*p = Y Rk = [ramp( X - xk ) for xk in breakpoints ] Sk = [step( X - xk ) for xk in breakpoints ] A = np.array([ ones, X ] + Rk + Sk ) p = lstsq(A.transpose(), Y, rcond=None)[0] # Parameters identification: a, b = p[0:2] ck = p[ 2:2+len(breakpoints) ] dk = p[ 2+len(breakpoints): ] # Estimation of the next break-points: newBreakpoints = breakpoints - dk/ck # Stop condition if np.max(np.abs(newBreakpoints - breakpoints)) < dt/5: break breakpoints = newBreakpoints else: print( 'maximum iteration reached' ) # Compute the final segmented fit: Xsolution = np.insert( np.append( breakpoints, max(X) ), 0, min(X) ) ones = np.ones_like(Xsolution) Rk = [ c*ramp( Xsolution - x0 ) for x0, c in zip(breakpoints, ck) ] Ysolution = a*ones + b*Xsolution + np.sum( Rk, axis=0 ) return Xsolution, Ysolution

Example:

import matplotlib.pyplot as plt X = np.linspace( 0, 10, 27 ) Y = 0.2*X - 0.3* ramp(X-2) + 0.3*ramp(X-6) + 0.05*np.random.randn(len(X)) plt.plot( X, Y, 'ok' ); initialBreakpoints = [1, 7] plt.plot( *SegmentedLinearReg( X, Y, initialBreakpoints ), '-r' ); plt.xlabel('X'); plt.ylabel('Y');

graph

[1]: Muggeo, V. M. (2003). Estimating regression models with unknown breakpoints. Statistics in medicine, 22(19), 3055-3071.