Posting here as I've been tasked with running ACE models for a twins analysis project I'm working on. ACE models refer to a form of structural equation analysis that seeks to to partition phenotypic variance into three categories: additive genetic variance (A), common (or shared) environmental factors (C), and specific (or nonshared) environmental factors plus measurement error (E). I'm a little rusty here, but I believe it is determined by calculating correlation strengths on a measure between twins within a family, and then comparing the correlation strength between the monozygotic (identical) and dizygotic (non-identical) twin pairs. Monozygotic (MZ) twins share 100% of their DNA code, while dizygotic (DZ) twins share on average 50% of their DNA code. Any phenotypic differences between monozygotic (MZ) twins is assumed to be driven by the environment. Any excess in similarity between MZ twins over DZ twins is assumed to driven by genetics. See Jöreskog (2020) for a more thorough overview.
This analysis is new for me, and I'm really enjoying the learning curve. The umx package has been a lifesaver, as working through openMX was too much for me for now (mastering matrix algebra will be the next challenge). All this being said, I am very aware of my knowledge gap on this analysis, and I have a concern from the data that I would like a second opinion on.
The project I'm working on uses a 3 level ordinal variable as one of our key variables. The very useful umx::umxACE() function can be modified to handle ordinal data by using the tryHard = "ordinal" option. We have conducted univariate models, but cannot move on to multivariate models due to fit indices dramatically increasing at the inclusion of additional variables. I am worried that this is due to the data we have, and it's making me suspicious of the ordinal univariate model results.
As it turns out, ~70% of parents reported identical scores to both twins in the data. Part of the issue here is 3 level ordinal data. This makes it very likely to see matching scores. Even if scores were random, I would expect matching scores 1/3 of the time. But given that nearly all of the twins come from the same household, and the scores for both twins have been reported by same the parent, I am not surprised by the similarity.
I strongly suspect this is the source issue, and that it probably explains why fit indices such as AIC inflate dramatically at the inclusion of additional variables; especially as the these additional variables are continuous-ish, and have a greater range of potential values.
What I don't know is why this is an issue, and the extent to which is problematic? (i.e., is this bad, but still report it and make sure to cover the implications/limitations the discussion section, or is this bad - do not report it).
Any guidance, thoughts and recommended materials will be appreciated.
References
Jöreskog, K. G. (2020). Classical Models for Twin Data. Structural Equation Modeling: A Multidisciplinary Journal, 28(1), 121–126. https://doi.org/10.1080/10705511.2020.1789465
Bates, TC, Maes, H & Neale, MC 2019, 'Umx: Twin and path-based structural equation modeling in R', Twin Research and Human Genetics, vol. 22, no. 1, pp. 27-41. https://doi.org/10.1017/thg.2019.2