Quadratic Association

Question

I am looking for the best way to depict a concave, quadratic association. I'm using logistic regression to measure the association between affect and military advancement (yes/no).

The primary predictor centered on the mean squared was significant, and it is graphically clear that Y increases as X increases only to a point, before leveling off and decreasing. I was able to determine at which level of X, Y reaches its maximum by adding the mean to the linear term (centered on the mean), divided by two times the quadratic term (mean + (b1/2b2).

Are results from logistic regression uninterpretable if there is a curvilinear/quadratic association between X and Y?

What would be the best way to determine the slope at different levels of X for a binary outcome? I would like to compare how the slope differs at varying levels of X (low vs. medium vs. high), and specifically how it decreases at higher levels of X.

Thanks!

Emily

Hi Emily, Welcome to CV. Note that your username, identicon, & a link to your user page are automatically added to every post you make, so there is no need to sign your posts. In fact, we prefer you don't. Also, there's no need to say "thank you" at the end of your post - it might seem rude at first, but it's part of the philosophy of this site (tour) to "Ask questions, get answers, no distractions" and it means future readers of your question don't need to read through the pleasantries. — Jan Kukacka
– Jan Kukacka, Commented Mar 13, 2018 at 15:23

whuber · Accepted Answer · 2025-07-07 16:16:25Z

Logistic regression estimates how the log odds of a binary response $Y$ varies with the values of the explanatory variables. It does so by estimating the coefficients of a linear combination of terms. Those coefficients are not what the question means by "slope," though, so we should take some care with the terminology. I hope the following account will help convince you of the value in a logistic regression context of continuing to think and work in terms of log odds rather than attempting to convert them into some other way of expressing random binary results (such as odds or probabilities).

Notation and analysis

Let the original explanatory variables (considered fixed, not random), before any transformations, interactions, or "feature engineering" have been applied, be $\mathbf X = (X_1, X_2, \ldots, X_k).$ The variables used in the model might include these original ones, or transformations of them, or some kind of combinations (linear or nonlinear) of them. To highlight what matters, let us call the variables used in the model $f_0(\mathbf X),$ $f_1(\mathbf X),$ and so on: they are all functions of the original variables. In these terms the logistic model can be expressed as

$$\text{log odds Y}(\mathbf X) = \beta_0 f_0(\mathbf X) + \beta_2 f_1(\mathbf X) + \cdots + \beta_p f_p(\mathbf X).$$

(When $E[Y(\mathbf X)] = P(\mathbf X),$ then by definition the log odds is simply $\log(P(\mathbf X):[1-P(\mathbf X)])$ $= \log(P(\mathbf X)) - \log(1-P(\mathbf X)).$) Often $f_0(\mathbf X) = 1$ is a constant "intercept" function and, not infrequently, the $f_i$ are the projection functions $f_i(\mathbf X) = X_i.$

The data used by the logistic regression are a sequence of $n$ values of $\mathbf X$ paired with their associated random responses $y_i.$ The result of the logistic regression is to obtain simultaneous estimates of the model coefficients, written $\hat\beta_j.$ Accompanying those estimates will be some estimate of their $(p+1)\times (p+1)$ variance-covariance matrix $\hat\Sigma.$

In this setup the slope of any original variable $X_j$ is the instantaneous rate of change of the hypothesized coefficients,

$$\begin{aligned} \operatorname{slope}(X_i) & = \frac{\partial}{\partial X_j}\left(\beta_0 f_0(\mathbf X) + \cdots + \beta_p f_p(\mathbf X)\right)\\ &= \beta_0\frac{\partial f_0(\mathbf X)}{\partial X_j} + \cdots + \beta_p\frac{\partial f_p(\mathbf X)}{\partial X_j}. \end{aligned}$$

This follows from basic rules of differentiation and the fact that the hypothesized coefficients $\beta_j$ are separate from and mathematically not dependent on the data that were collected.

The estimated slopes are obtained by plugging the estimated coefficients into this expression:

$$\widehat{\operatorname{slope}}(X_i) = \hat\beta_0\frac{\partial f_0(\mathbf X)}{\partial X_j} + \cdots + \hat\beta_p\frac{\partial f_p(\mathbf X)}{\partial X_j}.$$

More generally, the same two formulas for slope and estimated slope apply to any $X$ that depends on the original variables: just take the partial derivative of the log odds and plug in the estimated coefficients.

Application to the question

The "quadratic association" of the question's title means there is some quantity $X,$ depending only on the explanatory variables $\mathbf X,$ for which independently varying $X$ yields a quadratic function of the log odds. Equivalently, the partial derivative of the log odds with respect to $X$ must be a linear function of $X,$ meaning there are functions $\xi_0$ and $\xi_1$ for which

$$\operatorname{slope}(X) = \frac{\partial \text{ log odds}(\mathbf X)}{\partial X} = \xi_0(\mathbf X) + \xi_1(\mathbf X)X$$

where neither of the $\xi$ depends on $X$ itself (but might vary with some of the other original variables).

In the simplest case the $\xi$ are constant. This enables us to compare the estimated log odds without concern for the levels of any of the other variables: that is, we only have to plug in any hypothesized value of $X.$

Finally, standard errors of these slope estimates (or any linear combinations of them) can be estimated from the estimated variance-covariance matrix of the $\hat\beta.$ Writing $\mathbf v = (\frac{\partial f_0(\mathbf X)}{\partial X_j}, \ldots , \frac{\partial f_p(\mathbf X)}{\partial X_j})$ for the coefficients, the standard error is

$$\operatorname{se}(\widehat{\operatorname{slope}}(X)) = \sqrt{\mathbf v^\prime \hat\Sigma \mathbf v}.$$

This is usually less onerous than it might look because only the coefficients of $\widehat{\operatorname{slope}}(X)$ corresponding to nonzero partial derivatives contribute to the result.

Example

To illustrate some of the complications, let's generate the responses to have expected log odds of $3 + 2X_1 - 4X_1^2 + \exp(X_2).$ Here, the original variables are $X_1$ and $X_2$ and we are interested in the quadratic association determined by $3 + 2X_1 - 4X_1^2.$ (That is, $\xi_1 = 2$ and $\xi_2 = -4.$) However, we will estimate this model in a robust but indirect way using a degree-two orthogonal polynomial. In addition to a constant term, will have two other terms $f_1(X_1)$ and $f_2(X_2)$ that are computed using a black-box function.

The R code below exploits this black-box function to compute the derivatives numerically as

$$\frac{\partial f_j(X_1)}{\partial X_1} \approx \frac{f_j(X_1+\epsilon) - f_j(X_1-\epsilon)}{2\epsilon}$$

for an $\epsilon$ that is small enough to approximate the derivative but large enough to induce precise changes in the values of $f_j$ when computed in floating point arithmetic.

The first simulated dataset looks like this:

Not much of an association between the response $Y$ and the variables $X_j$ is evident, even with 1,000 observations. This looks like a tough modeling problem. However, there are more than enough observations to estimate the coefficients fairly well, as the summary of the logistic fit shows:

Call: glm(formula = Y ~ poly(X1, 2) + exp(X2), family = binomial, data = X) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.8990 0.3027 6.273 3.53e-10 *** poly(X1, 2)1 30.5043 3.6547 8.347 < 2e-16 *** poly(X1, 2)2 -45.9388 4.3581 -10.541 < 2e-16 *** exp(X2) 1.0402 0.2813 3.699 0.000217 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 654.55 on 999 degrees of freedom Residual deviance: 398.74 on 996 degrees of freedom

All individual terms have small p-values and the change in deviance (from 654 to 398) is enormous compared to the cost of just three degrees of freedom.

The true slope at $X_1 = 1$ is $\frac{\partial (3 + 2X_1 - 4X_1^2)}{\partial X_1}\bigg|_{X_1 = 1} = 2(1) - 4(2)(1) = -6.$ Numerical differentiation yields an estimate of $-5.16$ with a standard error of $0.644,$ indicating the estimate is comfortably close to the true value.

Upon iterating this experiment a thousand times, the resulting Z-scores (se-standardized deviations between the estimated slopes and the true values) ought to approximate a standard Normal distribution, assuming all the calculations (estimates and estimated variances) go well. Here is the histogram with the standard Normal density superimposed:

Everything is working as intended. See the R code for more details.

n <- 1e3 # Number of observations. d <- 2 # Number of original variables. expit <- \(y) exp(y) / (1 + exp(y))# Link function (log odds --> prob) eps <- 1e-3 # Infinitesimal for numerical derivatives x0 <- 1 # Specified value of X1 a1 <- 2 # Linear coefficient of X1 a2 <- -4 # Quadratic coefficient of X1 vars <- c("poly(X1, 2)1", "poly(X1, 2)2") # Names of vars involving X1 set.seed(17) Sim <- replicate(1e3, { # # Create the original variables according to a specific logistic model. # X <- as.data.frame(matrix(rnorm(d * n, 0, 1/2), n, dimnames = list(NULL, X = c("X1", "X2")))) X$Y <- with(X, rbinom(nrow(X), 1, expit(3 + a1 * X1 + a2 * X1^2 + exp(X2)))) # pairs(X) fit <- glm(Y ~ poly(X1, 2) + exp(X2), family = binomial, data = X) # summary(fit) # # Prepare to compute numerical derivatives. # X0 <- data.frame(X1 = x0 + c(-1,0,1) * eps, X2 = 0) P <- predict(with(X, poly(X1, 2)), X0$X1) colnames(P) <- vars X0 <- cbind(X0, P) X0$log.odds <- predict(fit, newdata = X0) # # Numerical calculation of the slope. # slope.hat <- (X0$log.odds[3] - X0$log.odds[1]) / (X0$X1[3] - X0$X1[1]) # # Compute the se of the slope via numerical derivatives of the variables. # dP <- (P[3, ] - P[1, ]) / (2 * eps) Sigma.hat <- vcov(fit)[vars, vars] slope.se <- sqrt(dP %*% Sigma.hat %*% dP) slope <- a1 + a2 * 2 * X0$X1[2] # True slope c(slope = slope, `numerical estimate` = slope.hat, se = slope.se) }) # # Plot the Z-scores comparing the numerical estimates of slope to the true value. # Z <- (Sim["numerical estimate", ] - Sim["slope", ]) / Sim["se", ] hist(Z, freq = FALSE, breaks = 15, ylim = c(0, 0.45), main = "Z Scores for Slope") curve(dnorm(x), add = TRUE, col = "Red", lwd = 2)

NuclAcc · Accepted Answer · 2018-03-13 16:58:23Z

The resulting coefficients of logistic regression are difficult to interpret in general. The coefficient when not squared is the change in log odds for every 1 unit increase in your IV. So if that IV is squared it is still interpreted the much the same way. The coefficient is equal to the change in log odds for every 1 unit increase in the squared IV (controlling for the other predictors in your model including the linear version of the IV).

Because logistic regression follows a logistic curve interpreting the slope as is would be strange. I think your best bet would be to transform to percent likelihood, so that its more interpretable and then use your low, medium, high method. Either that or just graph it, that will provide the most detail.

Michael Hardy · Accepted Answer · 2025-07-06 18:14:29Z

I would want to know what the $X$ variable is, and maybe how the sample was taken, before saying very much about this.

You say that as $X$ increases the probability that $Y=1$ (where I assume you mean every $Y$ value is either $0$ or $1$) increases and then decreases. The first thing I think of when you say that is that I might try including both $X$ and $X^2$ among the predictors.

The question describes a "quadratic association," which (although the description is a little unclear) strongly suggests both $X$ and $X^2$ are in the model space. The question concerns how to estimate the slopes at specified values of $X,$ which calculus tells us is a simple linear combination of the coefficients of $X$ and $X^2.$ — whuber
– whuber ♦, Commented Jul 6 at 18:57

Stack Exchange Network

Quadratic Association

3 Answers 3

Notation and analysis

Application to the question

Example

Hot Network Questions

Quadratic Association

3 Answers 3

Notation and analysis

Application to the question

Example

Related

Hot Network Questions