1
$\begingroup$

I have fitted a Linear Regression Model using one dataset. Now, I have another smaller dataset that I want to refine the model with. Can I use Ridge regression to update the estimated coefficients for this new dataset? Or do you recommend a more appropriate approach?

Edit: after the useful Answer by John Madden, I have implemented his approach using Python, as follows, do you find any issue in this code in reflecting that logic:

# Step 1: Compute coefficients on dataset (X1, y1) beta1 = np.linalg.pinv(X1) @ y1 print("Original coefficients:", beta1) # Step 2: Compute residuals on dataset (X2, y2) y_hat2 = X2 @ beta1 residual2 = y2 - y_hat2 # Step 3: Perform ridge regression on residuals ridge_reg = Ridge(alpha=alpha) ridge_reg.fit(X2, residual2) delta = ridge_reg.coef_ # Step 4: Adjust coefficients beta2 = beta1 + delta print("Adjusted coefficients:", beta2) 
$\endgroup$
10
  • $\begingroup$ Do you mean that you want to know what the regression on both data sets combined would have been if you calculated on all data when you had the chance? $\endgroup$ Commented Apr 16, 2024 at 21:14
  • $\begingroup$ I want to update the coefficient of the first linear regression model using the dataset2 $\endgroup$ Commented Apr 16, 2024 at 21:16
  • $\begingroup$ What do you mean by "update" the coefficients? $\endgroup$ Commented Apr 16, 2024 at 21:17
  • $\begingroup$ Using the coefficients from the first model as the start point of the second regression $\endgroup$ Commented Apr 16, 2024 at 21:20
  • 1
    $\begingroup$ @John The question does not refer to "transfer" explicitly: it uses the phrase "refine the model with." $\endgroup$ Commented Apr 17, 2024 at 14:47

1 Answer 1

2
$\begingroup$

I think that what you mean is that you have some coefficients estimated from a first dataset $(\mathbf{X}_1,\mathbf{y}_1)$ denoted as $\hat{\beta}_1$. Then, for dataset $(\mathbf{X}_2, \mathbf{y}_2)$, you want to do something like ridge regression, but instead of penalizing the $\ell_2$ norm of $\hat\beta_2$, you want to penalize the deviation between $\hat\beta_2$ and $\hat\beta_1$.

Or mathematically, $$ \underset{\beta}{\min} \Vert \mathbf{y}_2-\mathbf{X}_2\beta\Vert_2^2 + \lambda\Vert\beta-\hat\beta_1\Vert_2^2 \, . $$

If that's right, you can accomplish this by noting that: $$ \underset{\beta}{\min} \Vert \mathbf{y}_2-\mathbf{X}_2\beta\Vert_2^2 + \lambda\Vert\beta-\hat\beta_1\Vert_2^2 \iff \underset{\delta}{\min} \Vert \mathbf{y}_2-\mathbf{X}_2(\hat\beta_1+\delta)\Vert_2^2 + \lambda\Vert\delta\Vert_2^2 \\ \iff \underset{\delta}{\min} \Vert (\mathbf{y}_2-\mathbf{X}_2\hat\beta_1) -\mathbf{X}_2\delta)\Vert_2^2 + \lambda\Vert\delta\Vert_2^2 $$

This suggests the following procedure:

  1. Compute an estimate of $\hat\beta_1$ on the dataset $(\mathbf{X}_1,\mathbf{y}_1)$ using a procedure of your choice.
  2. Compute the residuals using the coefficients learnt from $(\mathbf{X}_1,\mathbf{y}_1)$ as $\tilde{\mathbf{y}}_2 = \mathbf{y}_2 - \mathbf{X}_2\hat\beta_1$.
  3. Do a standard ridge regression on $\tilde{\mathbf{y}}_2$ against $\mathbf{X}_2$ to get deviation coefficients $\delta$.
  4. Set $\hat\beta_2 = \hat\beta_1 + \delta$.

For the properties of such a procedure in the context of Lasso rather than Ridge regression, see <Li 2022>. Regrettably, I don't know of the analysis for the Ridge regression case.

$\endgroup$
5
  • $\begingroup$ thanks, a lot. I think this your solution is similar to this one: adapt-python.github.io/adapt/generated/… $\endgroup$ Commented Apr 16, 2024 at 21:48
  • $\begingroup$ @AdhamEnaya yes looks similar. $\endgroup$ Commented Apr 16, 2024 at 22:13
  • 1
    $\begingroup$ thanks. I have implemented your logic in python could you please have a look. $\endgroup$ Commented Apr 17, 2024 at 9:02
  • 1
    $\begingroup$ I have a follow-up question, please. my regression process is GLM-based, it is clear what I have to do in steps 1 and 2 of your proposed procedure, but I am not very sure about step 3, should I use the normal ridge regression, or a 'regulatized' version of my GLM model? $\endgroup$ Commented Apr 24, 2024 at 12:20
  • 1
    $\begingroup$ @AdhamEnaya good question. Ideally, we would use a regularized version of your GLM. But the hard part is that the residuals from dataset 2 may no longer follow the appropriate distribution for your GLM. This paper discusses the situation <arxiv.org/pdf/2105.14328.pdf>, but I think implementing it will require writing custom GLM code that allows for nonstandard regularization. $\endgroup$ Commented Apr 24, 2024 at 13:37

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.