1
$\begingroup$

I'm attempting to re-learn linear mixed-effects models with lmer via some tutorials online, but I'm struggling with the concept of intraclass correlation (ICC). I'm currently working with a model that has only a single random intercept (can't share the data, but here's the model: mod1 <- lmer(CDB ~ 1 + (1|fSite), data = BA3)). The response variable "CDB" is continuous; it can take on negative, zero, and positive values. fSite, factor(site), has 35 levels (3-6 obs. per level).

One source I'm using mentioned that ICC is the correlation between observations within the same group (CDB values within the same site). This confuses me. I thought the ICC is used to determine where the largest source of variance comes from. There is 'within group' variance: the variance around a single group mean (variability of spatially or temporally close observations) and 'between group' variance, the spread of many group means. So, if an ICC = ~1, this means large differences between groups and small difference between obs. within the same group. I'm missing the connection to how this relates to correlation between observations within the same group.

Are these two concepts the same? Are they partly related, or is one definition flat-out wrong?

$\endgroup$

2 Answers 2

2
$\begingroup$

I'm not sure who said "ICC is the correlation between observations within the same group" but to me that is not a very helpful way of thinking about it.

The ICC is the proportion of total variance in the dependent variable that is "at the group level" or "due to differences between (as opposed to within) groups." It is calculated by dividing the group-level variance by the sum of group and individual level variances. So it can range from 0 to 1. An ICC of 10% means that 10% of the variance in the dependent variable is due to between-group differences, and the remaining 90% is due to within-group differences.

That being said, the statement you cite is still broadly accurate because, as the ICC increases, the correlation between observations in the same group will tend to go up. At the limit case, an ICC of 1 means that 100% of the variance in the DV is due to between group differences and 0% is due to within-group differences. That in turn implies that all observations that share a group will all have exactly the same value for the DV. Thus the correlation between observations within a given group is perfect. At the other end, an ICC of zero means that 0% of the variance is due to between group differences, which in turn implies that the average value of the DV for observations within a particular group is identical, no matter what group you choose. So the ICC does have implications for within- and between-group correlations

But fundamentally, the ICC tells you about the proportion of variance at different analytic levels. In my view, any definition that doesn't emphasize that risks confusing people.

$\endgroup$
2
  • 1
    $\begingroup$ I didn't follow the claim of "perfect" correlations, because when one variable is constant, its correlation with any other variable is undefined. $\endgroup$ Commented Sep 10, 2024 at 17:44
  • $\begingroup$ It does indeed confuse people, haha. Thank you for helping me think it through! $\endgroup$ Commented Sep 10, 2024 at 17:45
2
$\begingroup$

Although a good answer was already given and accepted, the following is also relevant. The simple two-level model you used is:

$y_{ij} = b_0+u_j+e_{ij}$

Here, $u_j$ denotes the random influence of "site". If this model indeed completely represents the data generation of your $y$ variable, then we could wonder how high the correlation is of two randomly chosen $y$ values, say $y_{kj}$ and $y_{mj}$, from the same randomly chosen site $j$. The two $y$ values have something in common, namely $b_0$, which is constant (no variance) and $u_j$, which has variance! As a result (see proof below) of the common $u_j$ term, the covariance $cov(y_{kj},y_{mj})$ of $y_{kj}$ and $y_{mj}$ is equal to the variance $\sigma_u^2$ of $u_j$.

Now look at the correlation. The linear correlation of two variables, here $y_{kj}$ and $y_{mj}$, is defined as their covariance divided by the product of their two standard deviations. Here, the variance of $y_{kj}$ equals $\sigma_u^2 + \sigma_e^2$, and the same is true for the variance of $y_{mj}$. So, the standard deviations of $y_{kj}$ and $y_{mj}$ are equal, namely $\sqrt{\sigma_u^2 + \sigma_e^2}$. This means that the product of these two standard deviation equals $\sigma_u^2 + \sigma_e^2$. Finally, dividing the covariance by the product of the two standard deviations results in the well known formula for the ICC:

$ICC=\dfrac{\sigma_u^2}{\sigma_u^2 + \sigma_e^2}$

So, the ICC formula correlation indeed expresses two things: (1) the proportion variance "explained" by the grouping factor "site", and (2) the correlation between two random $y$ draws from a randomly chosen site.


Proof that $cov(y_{kj},y_{mj}) = \sigma_u^2$

First note that the expectation of $y_{kj}$ equals $E(y_{kj})=E(b_0+u_j+e_{kj})=b_0$ because $E(u_j)=E(e_{kj})=0$ by assumption.

The covariance of $y_{kj}$ and $y_{mj}$ is by definition:

$cov(y_{kj}, y_{mj}) = E(y_{kj}-E(y_{kj}))(y_{mj}-E(y_{mj})) = E(y_{kj}-b_0)(y_{mj}-b_0)$.

This can be written as:

$E(b_0+u_j+e_{kj}-b_0)(b_0+u_j+e_{mj}-b_0)=E(u_j+e_{kj})(u_j+e_{mj})=$

$E(u_j^2) + E(u_j\times{e_{mj}}) + E(e_{kj}\times{u_j}) + E(e_{kj}\times{e_{mj}})$

The last three terms in the above expression are zero. This is, because $u_j$ and $e_{jk}$ are assumed to be independent, so it must hold that $E(u_j\times{e_{mj}}) = E(u_j)\times{E(e_{mj})} = 0 \times 0$. Similarly, $E({e_{kj}\times u_j}) = 0 \times 0$. Also, $e_{kj}$ and $e_{mj}$ are assumed to be independent, and hence $E(e_{kj}\times{e_{mj}}) =E(e_{kj})\times{E(e_{mj})}=0 \times 0$.

Finally, the variance $\sigma_u^2$ of $u_j$ is by definition:

$\sigma_u^2 = E(u_j-E(u_j))^2 = E(u_j-0)^2 = E(u_j^2)$

and this means that

$cov(y_{kj},y_{mj})=\sigma_u^2$

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.