Since this question is about re-writing a model specification, I'll focus on that aspect. Without knowing more about the data or precise inferential goal, I can't comment on whether either model is the right tool. I'm also unsure why the OP needs to rewrite a functioning model to use a different function within the same software package.
Let's start from the OP's setup.
library(nlme) math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
Now I'm going to take the OP's nonlinear model specification, remove some extra parentheses and use some whitespace for readability.
nonlinear <- nlme( math ~ nb_wght * ( (beta_1_N + d_1i_N) + (beta_2_N + d_2i_N) * grade) + lb_wght * ( (beta_1_L + d_1i_L) + (beta_2_L + d_2i_L) * grade), data = math, fixed = beta_1_N + beta_2_N + beta_1_L + beta_2_L ~ 1, random = pdBlocked(list(d_1i_N + d_2i_N ~ 1, d_1i_L + d_2i_L ~ 1)), group = ~ id, start = c(35, 4, 35, 4), weights = varIdent(form =~ 1|factor(lb_wght)), control = list(returnObject = TRUE)) summary(nonlinear)
Next, I'll rename variables to more meaningful things so that I can better understand what's going on.
nonlinear <- nlme( math ~ nb_wght * ( (beta_nb_wght + blup_nb_wght) + (beta_nb_wght_grade + blup_nb_wght_grade) * grade) + lb_wght * ( (beta_lb_wght + blup_lb_wght) + (beta_lb_wght_grade + blup_lb_wght_grade) * grade), data = math, fixed = beta_nb_wght + beta_nb_wght_grade + beta_lb_wght + beta_lb_wght_grade ~ 1, random = pdBlocked(list(blup_nb_wght + blup_nb_wght_grade ~ 1, blup_lb_wght + blup_lb_wght_grade ~ 1)), group = ~ id, start = c(35, 4, 35, 4), weights = varIdent(form =~ 1|factor(lb_wght)), control = list(returnObject = TRUE)) summary(nonlinear)
Okay, now I can see how the pieces fit together a bit better. Time to re-write it as something closer to a classical linear model by re-ordering terms.
nonlinear <- nlme( math ~ (beta_nb_wght + blup_nb_wght) * nb_wght + (beta_nb_wght_grade + blup_nb_wght_grade) * I(grade*nb_wght) + (beta_lb_wght + blup_lb_wght) * lb_wght + (beta_lb_wght_grade + blup_lb_wght_grade) * I(grade*lb_wght), data = math, # sort the coefficients in order of interaction-level: # - this will make comparisons of the fixed easier later by matching sorting # - note that the predictor for a two-way interaction of continuous predictors # is just the (element-wise) product of the two predictors fixed = beta_nb_wght + beta_lb_wght + beta_nb_wght_grade + beta_lb_wght_grade ~ 1, # we use the list syntax for `random` instead of depending # on the separate group argument, again bringing us closer to the usual way of # specifying the linear model random = list(id = pdBlocked( list(blup_nb_wght + blup_nb_wght_grade ~ 1, blup_lb_wght + blup_lb_wght_grade ~ 1))), start = c(35, 4, 35, 4), weights = varIdent(form =~ 1|factor(lb_wght)), control = list(returnObject = TRUE)) summary(nonlinear)
Using the linear-model formula syntax, we now have fixed effects
math ~ 0 + nb_wght + lb_wght + grade:nb_wght + grade:lb_wght # Note the lack of intercept.
and random effects:
list(~ 0 + nb_wght + grade:nb_wght, ~ 0 + lb_wght + grade:lb_wght)
The big change in both cases is to make the betas/BLUPs variables implicit and only specify the data columns.
That brings us to the linear model:
linear <- lme( math ~ 0 + nb_wght + lb_wght + grade:nb_wght + grade:lb_wght, data = math, random = list(id = pdBlocked( list(~ 0 + nb_wght + grade:nb_wght, ~ 0 + lb_wght + grade:lb_wght))), weights = varIdent(form =~ 1|factor(lb_wght)), method = "ML", # default for nlme, but lme and lmer both default to REML control = list(returnObject = TRUE))
Let's compare them now.
They make make similar predictions:
> all.equal(fitted(linear), fitted(nonlinear)) [1] TRUE
The estimates of the fixed-effects are close (this comparison is why were-ordered the coefficients in the nonlinear model):
> all.equal(unname(fixed.effects(linear)), + unname(fixed.effects(nonlinear))) [1] TRUE
as are the variance parameters:
> all.equal(unname(as.numeric(VarCorr(linear)[,"StdDev"])), + unname(as.numeric(VarCorr(nonlinear)[,"StdDev"]))) [1] TRUE
So these models are identical, and their log likelihoods reflect that as well:
> logLik(nonlinear) 'log Lik.' -7963.061 (df=12) > logLik(linear) 'log Lik.' -7963.061 (df=12)
In other cases, you might find that one or the other method converges to a slightly better optimum (=better log likelihood).
If you look at the full model summaries, you'll see that they're the same model:
Non linear
> summary(nonlinear) Nonlinear mixed-effects model fit by maximum likelihood Model: math ~ (beta_nb_wght + blup_nb_wght) * nb_wght + (beta_nb_wght_grade + blup_nb_wght_grade) * I(grade * nb_wght) + (beta_lb_wght + blup_lb_wght) * lb_wght + (beta_lb_wght_grade + blup_lb_wght_grade) * I(grade * lb_wght) Data: math AIC BIC logLik 15950.12 16018.59 -7963.061 Random effects: Composite Structure: Blocked Block 1: blup_nb_wght, blup_nb_wght_grade Formula: list(blup_nb_wght ~ 1, blup_nb_wght_grade ~ 1) Level: id Structure: General positive-definite StdDev Corr blup_nb_wght 7.8921980 blp_n_ blup_nb_wght_grade 0.8795756 -0.009 Block 2: blup_lb_wght, blup_lb_wght_grade Formula: list(blup_lb_wght ~ 1, blup_lb_wght_grade ~ 1) Level: id Structure: General positive-definite StdDev Corr blup_lb_wght 8.99415931 blp_l_ blup_lb_wght_grade 0.02853591 0.986 Residual 5.93062152 Variance function: Structure: Different standard deviations per stratum Formula: ~1 | factor(lb_wght) Parameter estimates: 0 1 1.000000 1.170217 Fixed effects: beta_nb_wght + beta_lb_wght + beta_nb_wght_grade + beta_lb_wght_grade ~ 1 Value Std.Error DF t-value p-value beta_nb_wght 35.48131 0.3647427 931 97.27765 0 beta_lb_wght 32.79780 1.4062538 1287 23.32282 0 beta_nb_wght_grade 4.29731 0.0901739 1287 47.65582 0 beta_lb_wght_grade 4.87806 0.3382250 1287 14.42252 0 Correlation: bt_nb_ bt_lb_ bt_n__ beta_lb_wght 0.000 beta_nb_wght_grade -0.527 0.000 beta_lb_wght_grade 0.000 -0.549 0.000 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.080667594 -0.533467953 -0.002345929 0.537432306 2.566342070 Number of Observations: 2221 Number of Groups: 932
Linear
> summary(linear) Linear mixed-effects model fit by maximum likelihood Data: math AIC BIC logLik 15950.12 16018.59 -7963.061 Random effects: Composite Structure: Blocked Block 1: nb_wght, nb_wght:grade Formula: ~0 + nb_wght + grade:nb_wght | id Structure: General positive-definite StdDev Corr nb_wght 7.8921980 nb_wgh nb_wght:grade 0.8795756 -0.009 Block 2: lb_wght, lb_wght:grade Formula: ~0 + lb_wght + grade:lb_wght | id Structure: General positive-definite StdDev Corr lb_wght 8.99415931 lb_wgh lb_wght:grade 0.02853591 0.986 Residual 5.93062152 Variance function: Structure: Different standard deviations per stratum Formula: ~1 | factor(lb_wght) Parameter estimates: 0 1 1.000000 1.170217 Fixed effects: math ~ 0 + nb_wght + lb_wght + grade:nb_wght + grade:lb_wght Value Std.Error DF t-value p-value nb_wght 35.48131 0.3647427 930 97.27765 0 lb_wght 32.79780 1.4062538 930 23.32282 0 nb_wght:grade 4.29731 0.0901739 1288 47.65582 0 lb_wght:grade 4.87806 0.3382250 1288 14.42252 0 Correlation: nb_wgh lb_wgh nb_wg: lb_wght 0.000 nb_wght:grade -0.527 0.000 lb_wght:grade 0.000 -0.549 0.000 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.080667594 -0.533467953 -0.002345929 0.537432305 2.566342070 Number of Observations: 2221 Number of Groups: 932
self-study. $\endgroup$