Skip to main content
edited In response to comments
Source Link
EdM
  • 112.7k
  • 11
  • 120
  • 360

In response to comments

The above provides estimates at values of Light_uE values of 15 and 25, chosen to be at midpoints of two ranges of interest. It evaluates the differences in CellspermL levels between the two Species at those particular Light_uE values.

Once you have a regression model, you can evaluate estimated outcome differences among any linear combinations of predictor values, by applying the formula for the variance of a weighted sum of random variables to get the standard errors. The (co)variances of the parameter estimates are the model's variance-covariance matrix; the weights are the (differences among) the hypothesized predictor values.

That's what the contrast() function above does. If you use R functions outside the rms package, the emmeans package has tools that can handle a wide range of model types.

As an example continuing with the rms package, the following double-difference contrast evaluates the difference in linear slope of Light_uE values from 10 and 20, between the two Species (slope for "A" minus slope for "B"):

contrast(ols1, a=list(Light_uE=20,Species="A"), b=list(Light_uE=10,Species="A"), a2=list(Light_uE=20,Species="B"), b2=list(Light_uE=10,Species="B")) # Contrast S.E. Lower Upper t Pr(>|t|) # 11 352120.1 120417.6 93850.07 610390.2 2.92 0.0111 # # Error d.f.= 14 # # Confidence intervals are 0.95 individual intervals 

That uses CellspermL values estimated from the restricted cubic spline fit. If you instead use a segmented linear spline as in your question, you could proceed similarly.

With some effort you could even go to more complex estimates of the "shape" of the fit. Calling Function(ols1) shows the formula describing the spline. Section 2.4.5 of Frank Harrell's Regression Modelin Strategies shows how the spline coefficients of the model are represented in that formula; there's some scaling involved "[f]or numerical behavior and to put all basis functions ... on the same scale." In principle you could use that information to evaluate differences in slope at various Light_uE values, taking derivatives with respect to Light_uE as functions of the spline coefficients and applying the formula for the variance of a weighted sum of random variables to get standard errors.

In response to comments

The above provides estimates at values of Light_uE values of 15 and 25, chosen to be at midpoints of two ranges of interest. It evaluates the differences in CellspermL levels between the two Species at those particular Light_uE values.

Once you have a regression model, you can evaluate estimated outcome differences among any linear combinations of predictor values, by applying the formula for the variance of a weighted sum of random variables to get the standard errors. The (co)variances of the parameter estimates are the model's variance-covariance matrix; the weights are the (differences among) the hypothesized predictor values.

That's what the contrast() function above does. If you use R functions outside the rms package, the emmeans package has tools that can handle a wide range of model types.

As an example continuing with the rms package, the following double-difference contrast evaluates the difference in linear slope of Light_uE values from 10 and 20, between the two Species (slope for "A" minus slope for "B"):

contrast(ols1, a=list(Light_uE=20,Species="A"), b=list(Light_uE=10,Species="A"), a2=list(Light_uE=20,Species="B"), b2=list(Light_uE=10,Species="B")) # Contrast S.E. Lower Upper t Pr(>|t|) # 11 352120.1 120417.6 93850.07 610390.2 2.92 0.0111 # # Error d.f.= 14 # # Confidence intervals are 0.95 individual intervals 

That uses CellspermL values estimated from the restricted cubic spline fit. If you instead use a segmented linear spline as in your question, you could proceed similarly.

With some effort you could even go to more complex estimates of the "shape" of the fit. Calling Function(ols1) shows the formula describing the spline. Section 2.4.5 of Frank Harrell's Regression Modelin Strategies shows how the spline coefficients of the model are represented in that formula; there's some scaling involved "[f]or numerical behavior and to put all basis functions ... on the same scale." In principle you could use that information to evaluate differences in slope at various Light_uE values, taking derivatives with respect to Light_uE as functions of the spline coefficients and applying the formula for the variance of a weighted sum of random variables to get standard errors.

Source Link
EdM
  • 112.7k
  • 11
  • 120
  • 360

One approach would be to fit an appropriate model of CellspermL versus Light_uE, including an interaction with Species to allow for curves that differ between the Species, and then interrogate the model for Species differences at selected levels of Light_uE.

In this biological context, there's no reason to expect a segmented linear association between CellspermL and Light_uE. A smooth cubic regression spline can let the data show the form of the association.

Here's an example, using tools from Frank Harrell's rms package in R. It starts with your example_curves data frame.

library(rms) ## the next 2 lines provide information to the package's post-modeling tools ddec <- datadist(example_curves) options(datadist="ddec") ## use a restricted cubic spline with 3 knots for the smooth curve ## interact with Species to let the curves differ ols1 <- ols(CellspermL ~ rcs(Light_uE, 3)*Species, data = example_curves) ## examine modeled differences between Species A and B ## the following gives estimates for Species="A" minus Species="B" ## at Light_uE values of 15 and 25 contrast(ols1, a=list(Light_uE=c(15,25), Species="A"), b=list(Light_uE=c(15,25), Species="B")) # Light_uE Contrast S.E. Lower Upper t Pr(>|t|) # 1 15 -747336.7 287810.9 -1364629.7 -130043.58 -2.60 0.0211 # 2 25 -395216.5 209702.7 -844984.1 54551.06 -1.88 0.0804 

The anova() function from the rms package applied to the ols1 model provides useful information about overall significance and that of the nonlinear and interaction terms (not shown).

Make sure to have pre-specified levels of Light_uE at which to compare the Species. If you look at the results to help choose the levels of Light_uE, you can severely bias the results.

I'd recommend that you do the analysis as close to the original observed data as possible. Your estimates of cells per milliliter presumably are based on smaller numbers of counts that were then adjusted for the sampling volume. If all sampling volumes are the same, work with the number of counts. If not, use an offset term to account for differences in sampling volumes while working with the observed counts as the outcome. If the observed counts are low, consider working with a Poisson or negative-binomial model appropriate to counts.