Skip to main content
emphasis on the breakpoint positions estimation of the method
Source Link
xdze2
  • 201
  • 2
  • 4

The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimatedThe positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).

The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).

The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).

Source Link
xdze2
  • 201
  • 2
  • 4

The method proposed by Vito M. R. Muggeo[1] is relatively simple and efficient. It works for a specified number of segments, and for a continuous function. The positions of the breakpoints are iteratively estimated by performing, for each iteration, a segmented linear regression allowing jumps at the breakpoints. From the values of the jumps, the next breakpoint positions are deduced, until there are no more discontinuity (jumps).

"the process is iterated until possible convergence, which is not, in general, guaranteed"

In particular, the convergence or the result may depends on the first estimation of the breakpoints.

This is the method used in the R Segmented package.

Here is an implementation in python:

import numpy as np from numpy.linalg import lstsq ramp = lambda u: np.maximum( u, 0 ) step = lambda u: ( u > 0 ).astype(float) def SegmentedLinearReg( X, Y, breakpoints ): nIterationMax = 10 breakpoints = np.sort( np.array(breakpoints) ) dt = np.min( np.diff(X) ) ones = np.ones_like(X) for i in range( nIterationMax ): # Linear regression: solve A*p = Y Rk = [ramp( X - xk ) for xk in breakpoints ] Sk = [step( X - xk ) for xk in breakpoints ] A = np.array([ ones, X ] + Rk + Sk ) p = lstsq(A.transpose(), Y, rcond=None)[0] # Parameters identification: a, b = p[0:2] ck = p[ 2:2+len(breakpoints) ] dk = p[ 2+len(breakpoints): ] # Estimation of the next break-points: newBreakpoints = breakpoints - dk/ck # Stop condition if np.max(np.abs(newBreakpoints - breakpoints)) < dt/5: break breakpoints = newBreakpoints else: print( 'maximum iteration reached' ) # Compute the final segmented fit: Xsolution = np.insert( np.append( breakpoints, max(X) ), 0, min(X) ) ones = np.ones_like(Xsolution) Rk = [ c*ramp( Xsolution - x0 ) for x0, c in zip(breakpoints, ck) ] Ysolution = a*ones + b*Xsolution + np.sum( Rk, axis=0 ) return Xsolution, Ysolution 

Example:

import matplotlib.pyplot as plt X = np.linspace( 0, 10, 27 ) Y = 0.2*X - 0.3* ramp(X-2) + 0.3*ramp(X-6) + 0.05*np.random.randn(len(X)) plt.plot( X, Y, 'ok' ); initialBreakpoints = [1, 7] plt.plot( *SegmentedLinearReg( X, Y, initialBreakpoints ), '-r' ); plt.xlabel('X'); plt.ylabel('Y'); 

graph

[1]: Muggeo, V. M. (2003). Estimating regression models with unknown breakpoints. Statistics in medicine, 22(19), 3055-3071.