1
$\begingroup$

Posting here as I've been tasked with running ACE models for a twins analysis project I'm working on. ACE models refer to a form of structural equation analysis that seeks to to partition phenotypic variance into three categories: additive genetic variance (A), common (or shared) environmental factors (C), and specific (or nonshared) environmental factors plus measurement error (E). I'm a little rusty here, but I believe it is determined by calculating correlation strengths on a measure between twins within a family, and then comparing the correlation strength between the monozygotic (identical) and dizygotic (non-identical) twin pairs. Monozygotic (MZ) twins share 100% of their DNA code, while dizygotic (DZ) twins share on average 50% of their DNA code. Any phenotypic differences between monozygotic (MZ) twins is assumed to be driven by the environment. Any excess in similarity between MZ twins over DZ twins is assumed to driven by genetics. See Jöreskog (2020) for a more thorough overview.

This analysis is new for me, and I'm really enjoying the learning curve. The umx package has been a lifesaver, as working through openMX was too much for me for now (mastering matrix algebra will be the next challenge). All this being said, I am very aware of my knowledge gap on this analysis, and I have a concern from the data that I would like a second opinion on.

The project I'm working on uses a 3 level ordinal variable as one of our key variables. The very useful umx::umxACE() function can be modified to handle ordinal data by using the tryHard = "ordinal" option. We have conducted univariate models, but cannot move on to multivariate models due to fit indices dramatically increasing at the inclusion of additional variables. I am worried that this is due to the data we have, and it's making me suspicious of the ordinal univariate model results.

As it turns out, ~70% of parents reported identical scores to both twins in the data. Part of the issue here is 3 level ordinal data. This makes it very likely to see matching scores. Even if scores were random, I would expect matching scores 1/3 of the time. But given that nearly all of the twins come from the same household, and the scores for both twins have been reported by same the parent, I am not surprised by the similarity.

I strongly suspect this is the source issue, and that it probably explains why fit indices such as AIC inflate dramatically at the inclusion of additional variables; especially as the these additional variables are continuous-ish, and have a greater range of potential values.

What I don't know is why this is an issue, and the extent to which is problematic? (i.e., is this bad, but still report it and make sure to cover the implications/limitations the discussion section, or is this bad - do not report it).

Any guidance, thoughts and recommended materials will be appreciated.

References

Jöreskog, K. G. (2020). Classical Models for Twin Data. Structural Equation Modeling: A Multidisciplinary Journal, 28(1), 121–126. https://doi.org/10.1080/10705511.2020.1789465

Bates, TC, Maes, H & Neale, MC 2019, 'Umx: Twin and path-based structural equation modeling in R', Twin Research and Human Genetics, vol. 22, no. 1, pp. 27-41. https://doi.org/10.1017/thg.2019.2

$\endgroup$
5
  • $\begingroup$ To me (as a statistician reading a site for statistics questions), ACE refers to "alternating conditional expectations*, per Breiman & Friedman's 1985 paper in the Journal of the American Statistical Association. Given the potential confusion (indeed "ACE" comes up in a few other contexts that might show up here as well), it's probably best to be explicit about what ACE means in this (genetics-related) context that many readers will not be familiar with. $\endgroup$ Commented Mar 25 at 13:11
  • 1
    $\begingroup$ Thank you! I've clarified it at the start now. I'll add this to the list of multidisciplinary mix-ups in applied statistics. $\endgroup$ Commented Mar 25 at 15:05
  • 1
    $\begingroup$ I'm not sure it matters that this is an ACE or Cholesky model. What does the (poychoric) correlation matrix look like. Does that give you any clues? $\endgroup$ Commented Mar 25 at 18:26
  • $\begingroup$ @JeremyMiles This is true. My concern is more related to the data than the model, I'm just being thrown due to my unfamiliarity with the model. Across the entire sample we have a polychoric correlation coefficient of .709 . However, this is very close to the proportion of matching cases between the twins (.699), and I fear the high correlation is driven by the matching cases. I need to ask myself the extent to which the data is telling me something about the twins, and the extent to which its telling me about the reporting tendencies of the parents who completed the questionnaires. $\endgroup$ Commented Mar 26 at 8:55
  • $\begingroup$ You can fit the model in other programs that you're more familiar with - you just can't to the last stage where you multiply the loading matrix by the transpose. $\endgroup$ Commented Mar 26 at 10:24

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.