2
$\begingroup$

Suppose I have a data set with three variables: $y$, an outcome I want to understand, $x$, a variable which covaries with $x$ but we can assume to be measured perfectly, and $g$, a grouping variable that indentifies one of several clusters (e.g. subjects) a measurement is from. I fit the data with a multilevel (hierarchical) regression model of the form $$y_i = (\beta_0 + b_{0,g[i]}) + (\beta_1 + b_{1, g[i]})x_i + \varepsilon_i$$ where $i = 1, \ldots, n$ indexes observations in my dataset and $g[i]$ is the value of the variable $g$ for observation $i$.

In lme4 notation this model would be y ~ 1 + x + (1 + x | g), allowing every cluster to have its own slope and intercept, while also computing an overall slope and intercept.

The question I want to answer with this model is which is more important for explaining the outcome variable: the effect of $x$ or the between-cluster variability?

I think a practical way to answer this question might be to partition the variance (it is ok to assume the error distribution is Normal if that matters), but I don't really know how to do this. I know that if I fit a non-hierarchical model, i.e., y ~ 1 + x, the variance can be partitioned into what is explained by the regression and what isn't.

I also know that when I fit the model y ~ 1 + x + (1 + x | g), I estimate a variance component explained by the random effects grouped by g. From this model, is it possible to decompose the variance into the portion explained by the fixed effects, the portion explained by the random effects, and the residual variability? If so, how do I do that?

Is there a better way to answer this question? I think partitioning the explained variance is an intuitive way to say one variable is "more important" than another, but if there's a better metric for this, that is also ok.

For a concrete example, I simulated a dataset in R that follows this model, and all of the model parameters are estimable.

set.seed(101) n_subjects <- 20 x_values_per_subject <- c(0.1, 0.25, 0.5, 0.75) x <- rep(x_values_per_subject, times = n_subjects) g <- rep(1:n_subjects, each = length(x_values_per_subject)) global_b0 <- 5 individual_b0 <- rnorm(n_subjects, global_b0, 5) global_b1 <- -2 individual_b1 <- rnorm(n_subjects, global_b1, 3) residual_variance <- 1 mu <- global_b0 + individual_b0[g] + (global_b1 + individual_b1[g]) * x error_variate <- rnorm(length(mu), 0, residual_variance) y <- mu + error_variate observed_data <- data.frame(x, g, y) 

Additionally, here's some simple code to visualize the group-level trajectories.

plot( NULL, NULL, xlim = range(x_values_per_subject), ylim = range(observed_data$y), xlab = "x", ylab = "y" ) for (i in unique(observed_data$g)) { this_g <- subset(observed_data, g == i) lines(this_g$x, this_g$y, type = "b") } 

And fitting both models is easy.

library(lme4) simple_model <- lm(y ~ 1 + x, data = observed_data) multilevel_model <- lme4::lmer(y ~ 1 + x + (1 + x | g), data = observed_data, REML = FALSE) 

When I compared the models, I saw that both models estimate the same values for point estimates of the fixed effects, although the standard error of the fixed effect of x is much smaller now that we've accounted for the extra variation by clusters. If we use ANOVA or AIC to compare the models, the multilevel model is clearly much better.

summary(simple_model) summary(multilevel_model) anova(multilevel_model, simple_model) 

So given these two models, how do I determine if the fixed effect of x is more important than the between-cluster variability?

$\endgroup$
4
  • $\begingroup$ This may be a very stupid suggestion, but wouldn't fitting an lm with both x and g as fixed effects work? That would leave out the random slope though, but then you could separately fit lmer models with and without the random slope to see how important the slope is. $\endgroup$ Commented Jan 14 at 8:37
  • $\begingroup$ @sointu I guess that would work although I specifically don't g as a fixed effect because I want to generalize the results to new clusters. Comparing the models with and without random effects has a few problems but should work. The main problem is that I was really hoping for a quantitative measure, and the p-value for such a test is A) conservative and B) scales with sample size, so doesn't really provide useful quantitative information. I've found some approaches based on prediction variance and variance decomposition and I plan to answer my own question when I have time. $\endgroup$ Commented Jan 14 at 15:30
  • $\begingroup$ I've deleted my answer, but I stand by the assertion that comparing a model with the same fixed effects but different random effects should be done with REML not ML and that gls() is better than lm() in this case. $\endgroup$ Commented Jan 15 at 0:57
  • $\begingroup$ Yeah I realized you probably want an answer preserving g as random effect, but this just came to mind as a practical solution. I look forward to your answer! $\endgroup$ Commented Jan 15 at 8:53

1 Answer 1

2
$\begingroup$

As it turns out, this is not the first time this question has been asked on Cross Validated, see i.a., here and here. Although to my knowledge, neither specifically answers the question of determining "how important" the random effects are compared to the fixed effects.

The most common method for computing the "fixed effects $R^2$" is implemented in the performance package for R, via the package's icc function along with relevant citations. The documentation page for this function also includes a variance_decomposition() which seems to directly answer my question, so I'll provide a brief explanation of their method in the context of my model. The method for bayesian models appears to be similar to the underlying computations used for the frequentist models, although it doesn't estimate separate $R^2$ values for each model component (though this is possible and has been described in detail in multiple writings by Andrew Gelman, for example this paper).

The approach used by the performance package is to calculate the variance of the predictions with and without taking the random effects into account. I think this is beneficial because it uses the model estimates that are conditional on the random effects, so we avoid the problem of comparing two models, which has many theoretical issues as noted by Rick Hass. As long as the random effects are estimable (in a bayesian framework they typically will be, as we don't really have to worry much about boundary convergence errors when using something like Stan to estimate the parameters) we can compute these variances.

The performance package structures the ratio in a particular way and takes 1 - some of the variance components for interpretability, but I'll ignore that for right now. Instead we want to calculate two major quantities of interest. First, the variance due only to fixed effects,

$$\sigma^2_{\mathrm{FE}} = \text{V}_{i=1}^n\left( \left( \hat{\beta}_0 + 0 \right) + \left(\hat{\beta}_1 + 0\right)x_i \right)$$

where $V_{i=1}^n$ represents the variance operator over all of the samples $i = 1, \ldots, n$ following the notation of Gelman. The $x_i$ here are the actual observed values of the predictor in the sample. This corresponds to taking the sample variance of the posterior predictions made with re.form = NA in lme4 language. We then also need to calculate the variance when the random effects are included,

$$\sigma^2_{\mathrm{ME}} = \text{V}_{i=1}^n\left( \left( \hat{\beta}_0 + \hat{b}_{0, i} \right) + \left(\hat{\beta}_1 + \hat{b}_{1, i}\right)x_i \right).$$

Likewise, this corresonds to the sample variance of the predictions with the argument re.form = NULL in lme4. Then, the ratio $$\sigma^2_{\mathrm{FE}} / \sigma^2_{\mathrm{ME}}$$ gives us an interpretable effect size measurement that can help to answer the question "how much of the variance is explained by only the fixed effects?".

Since this is a ratio of variances, I presume there is a frequentist way to estimate a CI based on some $F$-distribution, but that's not my area of expertise and I don't immediately how to calculate it, but one could likely use bootstrapping to get a CI. Alternatively, as noted by the performance package, if we fit this model using a Bayesian framework, a CI is easy to get because we get posterior samples of both variances and thus of the ratio.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.