Classical twin data comprise of phenotypic measurements on monozygotic (MZ) and dizygotic (DZ) twin pairs who were raised together. To derive estimates of behavioral genetic parameters (e.g., heritability) from such data, the ACDE model is most often used. In principle, the model provides estimates of the effects of additive genes (A), the shared environment (C), non-additive genes (D), and the unshared environment (E).

However, if only classical twin data are available, there is not enough information to estimate all four parameters, that is, the system of equations is underdetermined or underidentified. To enable parameters to be estimated, it is customary to fix either D or C to zero, leading to the ACE and ADE models which are identified. The problem with this approach is that if the influence of the omitted parameter is not actually zero, the estimates will be biased. Additional data on other types of family members, such as adoptees, would be needed for the full model but such data are usually not readily available.

Against this backdrop, Jöreskog (2021a) proposed that the full ACDE model can be estimated with only classical twin data. (A version of the ACDE model for categorical data was developed in Jöreskog [2021b], while Jöreskog [2021a] concerns only continuous data. I will discuss only the latter, but the same arguments apply to the categorical case.) This is a startling claim because the ACDE model has long been regarded as obviously impossible to estimate as there is simply not enough information in the twin variances and covariances for the full model (MZ and DZ variance-covariance matrices are sufficient statistics for the typical twin model, i.e., no other aspect of the sample data provides additional information on the parameter values). Nevertheless, Jöreskog claimed that it can be done, demonstrating it in several examples. Karl Jöreskog is not a behavioral geneticist but he is a highly influential statistician whose work on structural equation models has had a major influence on twin research. Therefore, even though his claims sounded implausible, they seemed worth investigating.

After studying Jöreskog’s model in detail I conclude that it does not deliver what it promises. It does generate a set of estimates for A, C, D, and E, but there is no reason to believe that they reflect the true population parameters. As nice as it would be to estimate the ACDE model with ordinary twin data, it just cannot be done.

This post has the following structure. I will start with a brief overview of twin models, describing some of the ways in which their parameters can be estimated. Then I will show how Jöreskog proposes to solve the ACDE identification problem, and where he goes wrong. I will end with a discussion of why I think twin models are useful despite their limitations, and why they have continuing relevance in the genomic era. The Appendix contains additional analyses related to the ACDE model.

Continue reading