Author: Dalliard (Page 1 of 3)

Links for Summer ’22


  • A Note on Jöreskog’s ACDE Twin Model: A Solution, Not the Solution by Dolan et al. This critique was published on the heels of my own recent, critical post on Jöreskog’s twin model. Using Mendelian algebra and a simple one-locus model, Dolan et al. show that Jöreskog’s estimates are biased. They also note that the combination of MZ and DZ covariances that Jöreskog proposes as an estimator of additive genetic variance does not have the correct expected value. While these arguments are true and on point, in their short article Dolan et al. do not go into what I think is the main problem with Jöreskog’s model: the absurdity of the idea that minimizing the Euclidean norm would produce meaningful behavioral genetic estimates. They note that sometimes Jöreskog’s ACDE estimates may be less biased than ACE and ADE estimates, but that would be pure happenstance because the data generating mechanism suggested by Jöreskog’s model is never realistic. In contrast, the ACE model (or its submodel, AE) is often a realistic approximation of the true data generating mechanism, and even if this is not the case, the amount of bias is usually tolerably low, while the biases of Jöreskog’s estimates can be severe in typical datasets (e.g., if AE is the true model).
  • Polygenic Health Index, General Health, and Disease Risk by Widen et al. This is a paper from people associated with Steve Hsu’s eugenics biotechnology startup. With UK Biobank data, they build an index from polygenic risk scores for twenty diseases (e.g., diabetes, heart disease, schizophrenia), and show that lower values on this index are associated with a lower risk for almost all the diseases included and a higher risk for none. The index also predicts a longer lifespan, and works, with lower accuracy, within families (between siblings) as well. Thus the index is a candidate for use in embryo selection. A common anti-eugenic argument is that by artificially selecting for something positive one may inadvertently select for something negative. The paper shows that in fact one can simultaneously decrease the risk of many diseases without increasing that of any of them. Generally, the argument about accidental adverse selection rests on the tacit assumption that the status quo where eugenic and dysgenic concerns are ignored is somehow natural, neutral, and harmless. However, every society selects for something and it is seems unlikely that, say, embryo selection based on polygenic index scores would have worse consequences than the status quo. For example, selection against educational attainment and for increased criminal offending happen in some contemporary societies, but that is hardly any kind of inevitable state of affairs that should not be tampered with.

Cognitive abilities

  • Brain size and intelligence: 2022 by Emil Kirkegaard. A good review of the state of the brain size and IQ literature. It seems that the true correlation is around 0.30.
  • General or specific abilities? Evidence from 33 countries participating in the PISA assessments by Pokropek et al. Arthur Jensen coined the term specificity doctrine to refer to the notion that cognitive ability tests derive their meaning and validity from the manifest surface content of the tests (e.g., a vocabulary test must solely or primarily measure the size of one’s vocabulary or, perhaps, verbal ability). He contrasted this view with the latent variable perspective, according to which the specific content of tests is not that relevant because all tests are measures of a small number of latent abilities, most importantly the g factor, which can be assessed with any kind of cognitive tests (see Jensen, 1984). While the specificity doctrine has very little to recommend for it (see also e.g., Canivez, 2013), it remains a highly popular approach to interpreting test scores. For example, in research on the PISA student achievement tests, the focus is almost always on specific tests or skills like math and reading rather than the common variance underlying the different tests. Pokropek et al. analyze the PISA tests and show that in all 33 OECD countries a g factor model fits the data much better than non-g models that have been proposed in the literature. The PISA items are close to being congeneric (i.e., with a single common factor), with the specific factors correlating with each other at close to 0.90, on average. The amount of reliable non-g variance is so low that subtests cannot be treated as measures of specific skill domains like math, reading, or science. The correct way to interpret PISA tests is at the general factor level, which is where the reliability and predictive validity of the tests is concentrated. The relevance, if any, of specific abilities is in their possible incremental validity over the g factor.
  • Training working memory for two years—No evidence of transfer to intelligence by Watrin et al. Another study showing that training cognitive skills improves the trained skills but has no effect on other skills or intelligence in general. This is another datum that supports the existence of a reflective, causal general intelligence factor, while it contradicts the idea that general intelligence is an epiphenomenon that arises from a sampling of specific abilities.

Group differences

  • How useful are national IQs? by Noah Carl. A nice defense of research on national IQs. Interesting point: “If the measured IQ in Sub-Saharan Africa is 80, this would mean the massive difference in environment between Sub-Saharan Africa and the US reduces IQ by only 5 points, yet the comparatively small difference in environment between black and white Americans somehow reduces it by 15 points.”
  • Analyzing racial disparities in socioeconomic outcomes in three NCES datasets by Jay M. A lucidly written analysis of the main drivers of racial/ethnic disparities in educational attainment, occupational prestige, and income in America, based on several large longitudinal datasets. Some stylized facts from the many models reported: Outcome gaps strongly favor whites to blacks in unconditional analyses but these gaps are eliminated or reversed after controlling for just high school test scores and grades; Asians outachieve whites to a similar degree regardless of whether analyses are adjusted for test scores and grades; Hispanics and Native Americans are as disadvantaged as blacks in unconditional analyses, and while controlling for test scores and grades typically makes them statistically indistinguishable from whites, the effect of these covariates in them is clearly weaker than in blacks; the effect of cognitive skills is larger for educational attainment and occupational prestige than for income (although this may be partly because the analysis platform used does not permit the more appropriate log-normal functional form).
  • Examination of differential effects of cognitive abilities on reading and mathematics achievement across race and ethnicity: Evidence with the WJ IV by Hajovsky & Chesnut. This study finds that scalar invariance with respect to white, black, Hispanic, and Asian Americans holds for the Woodcock-Johnson IV IQ test. For the most part, the test also predicts achievement test scores similarly across races. The achievement tests were also invariant with respect to race/ethnicity. While these results are plausible, there are several aspects of this study that makes it relatively uninformative. Firstly, they fit a model with seven first-order factors, which is the test publisher’s preferred model, but, as usual with these things, it is an overfactored model and the fit is pretty marginal. Secondly, they don’t test for strict invariance. Thirdly, the white sample is much larger than the non-white samples, which means that the fit in whites contributes disproportionately to the invariance tests. Fourthly and most damagingly, they adjust all the test scores for parental education, which removes unknown amounts of genetic and environmental variance from the scores. The results reported therefore concern a poorly fitting model based on test scores part of whose variance has been removed in a way that may in itself be racially non-invariant. I would like to a see a methodologically more thoughtful analysis of this dataset.
  • Race and the Mismeasure of School Quality by Angrist et al. Students in schools with larger white and Asian student shares have superior academic outcomes. This instrumental variable analysis suggests that this is not because such schools offer superior instruction but simply because of selection effects, so that if students were randomized to attend schools with different racial compositions, they would be expected to achieve at similar levels. This seems plausible enough, and Angrist et al. suggest that this information should “increase the demand for schools with lower white enrollment.” That does not seem plausible to me because, as they also note, “school choice may respond more to peer characteristics than to value-added.” A “good school” is one primarily because of the quality of its students, not the quality of its teaching.

Classical Twin Data and the ACDE Model

Classical twin data comprise of phenotypic measurements on monozygotic (MZ) and dizygotic (DZ) twin pairs who were raised together. To derive estimates of behavioral genetic parameters (e.g., heritability) from such data, the ACDE model is most often used. In principle, the model provides estimates of the effects of additive genes (A), the shared environment (C), non-additive genes (D), and the unshared environment (E).

However, if only classical twin data are available, there is not enough information to estimate all four parameters, that is, the system of equations is underdetermined or underidentified. To enable parameters to be estimated, it is customary to fix either D or C to zero, leading to the ACE and ADE models which are identified. The problem with this approach is that if the influence of the omitted parameter is not actually zero, the estimates will be biased. Additional data on other types of family members, such as adoptees, would be needed for the full model but such data are usually not readily available.

Against this backdrop, Jöreskog (2021a) proposed that the full ACDE model can be estimated with only classical twin data. (A version of the ACDE model for categorical data was developed in Jöreskog [2021b], while Jöreskog [2021a] concerns only continuous data. I will discuss only the latter, but the same arguments apply to the categorical case.) This is a startling claim because the ACDE model has long been regarded as obviously impossible to estimate as there is simply not enough information in the twin variances and covariances for the full model (MZ and DZ variance-covariance matrices are sufficient statistics for the typical twin model, i.e., no other aspect of the sample data provides additional information on the parameter values). Nevertheless, Jöreskog claimed that it can be done, demonstrating it in several examples. Karl Jöreskog is not a behavioral geneticist but he is a highly influential statistician whose work on structural equation models has had a major influence on twin research. Therefore, even though his claims sounded implausible, they seemed worth investigating.

After studying Jöreskog’s model in detail I conclude that it does not deliver what it promises. It does generate a set of estimates for A, C, D, and E, but there is no reason to believe that they reflect the true population parameters. As nice as it would be to estimate the ACDE model with ordinary twin data, it just cannot be done.

This post has the following structure. I will start with a brief overview of twin models, describing some of the ways in which their parameters can be estimated. Then I will show how Jöreskog proposes to solve the ACDE identification problem, and where he goes wrong. I will end with a discussion of why I think twin models are useful despite their limitations, and why they have continuing relevance in the genomic era. The Appendix contains additional analyses related to the ACDE model.

Continue reading

Links for May ’22

  • Investigating bias in conventional twin study estimates of genetic and environmental influence for educational attainment by Wolfram & Morris. The shared environment component in twin studies is an aggregate of effects not only of the shared environment proper but also anything else that a twin pair shares but other individuals do not. The component captures the influence of assortative mating, age effects, and cohort effects, for example. This is another twin-family study that finds that the effect of heredity on educational attainment may have been underestimated and that of the shared environment overestimated in classical twin studies. The twin-specific environment appears to be more important than the family environment per se.
  • What the Students for Fair Admissions Cases Reveal About Racial Preferences by Arcidiacono et al. As a result of court cases regarding admissions to Harvard and UNC-Chapel Hill, lots of admissions data from those schools have been made public. Peter Arcidiacono has been an expert witness in these cases and this is another of his analyses of the data. There is nothing too surprising here. For example, black applicants to Harvard whose SAT scores and high school GPAs are at around the 30th to 40th percentile of the Harvard applicant pool distribution have the same admit rate as white and Asian applicants above the 90th percentile.
  • Genetics of cognitive performance, education and learning: from research to policy? by Peter Visscher and A Very Bad Review by Nick Patterson. There is nothing particularly insightful or original in these articles, but they are notable in that in them two of the heavyweights of today’s genetics push back against the recent anti-behavioral genetics discourse. The academia as a whole has moved leftward since the days of The Bell Curve and Arthur Jensen, but, on the other hand, behavioral genetics has moved closer to the center of genetics. Top geneticists these days cannot dismiss behavioral genetics as easily as in the days of Richard Lewontin and co. because behavioral genetics is now theoretically and methodologically tightly integrated with the rest of genetics.
  • Air Pollution and Student Performance in the U.S. by Gilraine & Zheng. Using instrumental variables related to variations in pollution levels coming from nearby power plants to control for endogeneity, this study finds some effects of air pollution on test scores. After a brief skim of the paper, the results seem plausible enough to me, mainly because they are smaller than what some other studies have claimed.
  • The Parent Trap–Review of Hilger by Alex Tabarrok. A smart response to a recent book by Nate Hilger making implausible claims about the effects of parenting on children’s outcomes and advocating for a radical enlargement of state involvement in the raising of children. A basic problem in today’s social policy thinking is that it is only concerned with what happens post conception. Even a modest shift in the human capital characteristics of parents would probably do a lot more good than anything Hilger proposes.

Links for April ’22

IQ and psychometrics

  • On the Continued Misinterpretation of Stereotype Threat as Accounting for Black-White Differences on Cognitive Tests by Tomeh & Sackett. A common misconception about stereotype threat, and a major reason for the popularity of the idea, is that in the absence of threat in the testing situation, the black-white IQ gap is eliminated. This is of course not the case but rather the experimental activation of stereotypes has (sometimes) been found to make the black-white gap larger than it normally is. In an analysis of early writings on stereotype threat, Sackett et al. (2004) reported that this misinterpretation was found in the majority of journal articles, textbooks, and popular press articles discussing the effect. In the new article, Tomeh and Sackett find that more recent textbooks and journal articles are still about equally likely to misinterpret stereotype threat in this way as to describe it correctly. I had hoped that the large multi-lab study of the effect would have put the whole idea to bed by now, but that study has unfortunately been delayed.
  • Invariance: What Does Measurement Invariance Allow us to Claim? by John Protzko. In this study people were randomized to complete either a scale aiming to measure “search for meaning in life”, or an altered nonsense version of the same scale where the words “meaning” and “purpose” had been replaced with the word “gavagai”. The respondents indicated their level of agreement or disagreement with statements such as “I am searching for meaning/gavagai in my life”. Both groups also completed an unaltered “free will” scale, and confirmatory factor models where a single factor underlay the “meaning/gavagai” items while another factor underlay the “free will” items were estimated. The two groups showed not only configural but also metric and scalar invariance for these factors. Given the usual interpretation of factorial invariance in psychometrics, this would suggest that the mean difference observed between the two groups on the “meaning/gavagai” scale reflects a mean difference on a particular latent construct. The data used were made available online, and I was able replicate the finding of configural, metric, and scalar invariance, given the ΔCFI/RMSEA criteria (strict invariance was not supported). The paradox appears to stem from the fact that individual differences on the “meaning in life” scale mostly reflect the wording and format of the items as well as response styles rather than tapping into a specific latent attitude which may not even exist, given the vagueness of the “meaning in life” scale. I found that I could move from scalar invariance to a more constrained model where all of the “meaning/gavagai” items had the same values for loadings and intercepts without worsening the model fit. So it seems that all the items were measuring the same thing (or things) but what that is is not apparent from a surface analysis of the items. Jordan Lasker has written a long response to Protzko, taking issue with the idea that two scales can have the same meaning without strict invariance as well as with the specific fit indices used. While I agree that strict invariance should always be pursued, Protzko’s discovery of scalar invariance using the conventional fit criteria is nevertheless interesting and requires an explanation. I think Lasker also makes a mistake in his analysis by setting the variances of the “meaning in life/gavagai” factors both to 1 even though this is not a constraint required for any level of factorial invariance. The extraneous constraint distorts his loadings estimates.
  • Effort impacts IQ test scores in a minor way: A multi-study investigation with healthy adult volunteers by Bates & Gignac. In three experiments (total N = 1201), adult participants first took a short spatial ability test (like this one) and were randomly assigned either to a treatment group or to a control group. Both groups then completed another version of the same test, with the treatment group participants promised a monetary reward if they improved their score by at least 10%. The effect of the incentives on test scores was small, d = 0.166, corresponding to 2.5 points on a standard IQ scale. This suggests that the effect size of d = 0.64 (or 9.6 points) reported in the meta-analysis by Duckworth et al. is strongly upwardly biased, as has been suspected. A limitation of the study is that the incentives were small, £10 at most. However, the participants were recruited through a crowdsourcing website and paid £1.20 for their participation (excluding the incentive bonuses), so it is possible that the rewards were substantial to them. Nevertheless, I would have liked to see if a genuinely large reward had a larger effect. Bates & Gignac also conducted a series of big observational studies (total N = 3007) where the correlation between test performance and a self-report measure of test motivation was 0.28. However, this correlation is ambiguous because self-reported motivation may be related to how easy or hard the respondent finds the test.


  • The Coin Flip by Spotted Toad. This is an illuminating commentary on the Tennessee Pre-K study (on which I commented here) and the difficulty of causal inference in long-term experiments.
  • Do Meta-Analyses Oversell the Longer-Term Effects of Programs? Part 1 & Part 2 by Bailey & Weiss. This analysis found that in a meta-analytic sample of postsecondary education RCTs seeking to improve student outcomes, trials that reported larger initial effects were more likely to have long-term follow-up data collected and published. While this could be innocuous, with more effective interventions being selected for further study, it could also simply mean that studies more biased to the positive direction by sampling error were selected. So when you see a study touting the long-term benefits of some educational intervention, keep in mind that the sample may have been followed up only because the initial results were more promising than in other samples subjected to the same or similar interventions.
  • An Anatomy of the Intergenerational Correlation of Educational Attainment -Learning from the Educational Attainments of Norwegian Twins and their Children by Baier et al. Using Norwegian register data on the educational attainment of twins and their children, this study finds that the intergenerational correlation for education is entirely genetically mediated in Norway. The heritability of education was about 60 percent is both parents and children, while the shared environmental variance was 16% in parents and only 2% in children. This indicates that the shared environment is much less important for educational attainment in Norway than elsewhere (cf., Silventoinen et al., 2020), although this is partly a function of how assortative mating modeled.

Links for March ’22


  • Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals by Okbay et al. This is the newest iteration of the educational attainment GWAS by the SSGAC consortium, now with a sample of three million people. It was published today and I have only skimmed it. The number of SNPs identified is about 4,000 now, up from 1,300 in the previous GWAS, while the $R^2$ increased from 11–13% to 12–16%, depending on the validation sample. They also conclude that there are no common SNPs with substantial dominance effects for educational attainment, underlining the validity of the additive genetic model. The within-family effect sizes are about 56% of the population effect sizes for educational attainment, while the same ratio is 82% for IQ and more than 90% for height and BMI. The discrepancy between the within-family and population estimates is probably mostly due to indirect genetic effects (“genetic nurture”) and assortative mating. Replicating SNP effects from the previous, smaller education GWAS sample, they find that 99.7% of the SNP effects have matching signs in the new data, and that 93% are significant at the 1% level or lower, which fits theoretical predictions well (it seems that the GWAS enterprise has vindicated the much-derided null hypothesis significance testing paradigm).
  • Cross-trait assortative mating is widespread and inflates genetic correlation estimates by Border et al. The genetic correlation is a statistic measuring the extent to which genetic effects on two different traits are correlated. It is easy enough to calculate, but not easy to interpret because while the simplest interpretation is pleiotropy, several other causal and non-causal explanations are possible. This paper suggests that many genetic correlations are non-causal and result from cross-assortative mating, e.g., smarter than average women preferring to have children with taller than average men, which leads to a genetic correlation between IQ and height genes in the next generation even if height genes have no effect on IQ nor IQ genes on height. Among other findings, the paper suggests that the importance of the general factor of psychopathology has been overestimated due to a failure to consider cross-trait assortative mating.
  • Modeling assortative mating and genetic similarities between partners, siblings, and in-laws by Torvik et al. This is a nice example of using psychometric methods to infer latent genetic parameters.
  • Behavioral geneticist Lindon Eaves has died. He was one of the major creative forces behind the methodology of modern twin and family studies. I did not know that he was also an ordained Anglican priest. You do not see many men of the cloth in science these days (or at least their creed is rather different now). Eric Turkheimer says that he never heard Eaves utter an illiberal word, but I do notice some forbidden literature on his bookshelf at the first link.

IQ and personality


  • More waves in longitudinal studies do not help against initial sampling error by Emil Kirkegaard. Speaking of James Heckman, he has published another one of his endless reanalyses of the Perry Preschool study. Emil has a fun takedown of this absurd enterprise.
  • Assortative Mating and the Industrial Revolution: England, 1754-2021 by Clark & Cummings. In another installment of Greg Clark’s studies into the persistence of social status across generations, he has apparently found a constant, latent status correlation of 0.80 between spouses in England over the last few centuries. This suggest that grooms and brides matched tightly on underlying educational and occupational abilities even when higher education was rare and female participation in the labor market was limited. I have previously commented briefly on Clark’s work and the role of assortative mating in it here

Links for February ’22


  • The “Golden Age” of Behavior Genetics? by Evan Charney. The author is a political scientist best known for his anti-hereditarian screeds, of which this is the latest. He likes to discuss various random phenomena from molecular biology, describing them as inscrutably complex, a hopeless tangle in the face of which genetic analyses are futile. Unfortunately, his understanding of the statistical models designed to cut through that tangle is very limited. For example, he endorses Eric Turkheimer’s howler that genome-wide association studies are “p-hacking”, and makes a ridiculous argument about GWAS findings being non-replicable (p. 8)–he does not appear to know, among other things, that statistical power is proportional to sample size (Ns in the studies he cites range from ~100k to ~1000k), that the p-value is a random variable, or that SNP “hits” are subject to the winner’s curse (he cites but evidently never read Okbay et al., 2016 and Lee et al., 2018, wherein it is shown that GWAS replication rates match theoretical expectations). He seeks to identify and amplify all possible sources of bias that could inflate genetic estimates, while ignoring biases in the opposite direction (e.g., the attenuating effect of assortative mating on within-sibship genomic estimates). Often the article is weirdly disjointed, e.g., Charney first discusses how sibling models have been used to control for population stratification, and then a couple of pages later says that it is impossible to know whether differences in religious affiliation are due to heritability or stratification. All in all, the article is a good example of what Thomas Bouchard has called pseudo-analysis.
  • Neither nature nor nurture: Using extended pedigree data to elucidate the origins of indirect genetic effects on offspring educational outcomes by Nivard et al. Contra naysayers like Charney, we are in the midst of a genuine golden age in behavior genetics. The underlying reason is the abundance of genomic data, which has spurred the development of so many new methods that it is hard to keep up with them. This preprint is the latest salvo in the debate about indirect genetic effects. Previous research has found indirect parental genetic effects in models where child phenotypes are regressed on child and parent polygenic scores. This study refines the design by doing the regression on adjusted parental polygenic scores that capture the personal deviations of parents’ scores from the mean scores of the parents and their own siblings. This refined design finds scant evidence for indirect parental genetic effects on children’s test scores, suggesting instead that apparent indirect effects are grandparental or “dynastic” effects of some sort. I think assortative mating is the most likely culprit. A limitation of this study is that even with a big sibling sample, the power to discriminate between different models is not high. Moreover, the study does not actually test the difference between the βs of the sibship and personal polygenic scores, and instead reasons from differences in significance, which is bad form.
  • The genetics of specific cognitive abilities by Procopio et al. This impressively large meta-analysis finds the heritability of specific abilities to be similar to that of g. That may be the case although most measures of non-g abilities in the analysis are confounded by g. They can formally separate g and non-g only in the TEDS cohort which has psychometrically rather weak measures of g.


  • Ian Deary and Robert Sternberg answer five self-inflicted questions about human intelligence. The two interlocutors in this discussion are mismatched: Deary is the most important intelligence researcher of his generation, known for his careful, wide-ranging empirical work, while Sternberg is one of the greatest blowhards and empty suits in psychology, known for generating mountains of repetitive, grandiose verbiage and for his disdain for anything but the most perfunctory tests of the theoretical entities that proliferate in his writings. Sternberg’s entries provide little insight, but there is some comedy in first reading his bloviations and then Deary’s courteous but often quietly savage responses. Deary emphasizes the value of establishing empirical regularities before or even instead of formulating psychological theories; notes the ubiquity of the jangle fallacy in cognitive research; and argues that cognitive psychological approaches have not generated any reductionist traction in explaining intelligence. According to Deary, a hard problem in intelligence research is one of public relations, that is, getting “across all the empirical regularities known about intelligence test scores”, the establishment of which has been “a success without equal in psychology.”
  • More articles by Stephen Breuning that need retraction by Russell Warne. Stephen Breuning is an erstwhile academic psychologist who was caught fabricating data on a large scale. He received the first US criminal conviction for research fraud in 1988. Nevertheless, many of his publications have not been retracted and continue to be cited, e.g., in the influential meta-analysis of the effects of test motivation on IQ by Duckworth et al. (2011). Warne reviews four of Breuning’s unretracted studies and identifies a number of inconsistencies and implausibilities that point to data fraud. It might be useful to further analyze these studies with GRIM, SPRITE, and the like.

Sex and race

  • Sex differences in adolescents’ occupational aspirations: Variations across time and place by Stoet & Geary. More evidence for the gender equality paradox which postulates that sex differences are larger in wealthier and freer societies because heritable sex differences are suppressed in poorer societies where individuals have less choice.
  • Why Are Racial Problems in the United States So Intractable? by Joseph Heath. Most modern societies have dealt with ethnic and racial diversity either by trying to integrate minorities to the majority population, or by recognizing the separateness of minorities and devolving political power to them. Some countries have judged the success of these efforts through the lens of equal opportunity, while others have sought outcome equality. Heath argues that race problems involving African-Americans are so intractable and acrimonious because there is no agreement on whether integration or separatism should be pursued, nor on how success and failure in racial affairs are to be judged. He manages to squeeze a good deal of analytic juice from this simple model while avoiding “bad actor” explanations which attribute all racial problems either to white malevolence or black incompetence. Money quote: “[T]he best way to describe the current American ap­proach to racial inclusion would to be to say that it is attempting to achieve Singaporean outcomes using Canadian methods and legal frameworks.”

Links for January ’22

I will try to get in the habit of collecting the most interesting studies, articles, posts, etc. related to human biodiversity in a monthly post, together with some commentary. The links are not necessarily to brand-new stuff; they are just what I happened to come across recently. Continue reading

The Persistence of Cognitive Inequality: Reflections on Arthur Jensen’s “Not Unreasonable Hypothesis” after Fifty Years

In 1969, Harvard Educational Review published a long, 122-page article under the title “How Much Can We Boost IQ and Scholastic Achievement?” It was authored by Arthur R. Jensen (1923–2012), a professor of educational psychology at the University of California, Berkeley. The article offered an overview of the measurement and determinants of cognitive ability and its relation to academic achievement, as well as a largely negative assessment of attempts to ameliorate intellectual and educational deficiencies through preschool and compensatory education programs. Jensen also made some suggestions on how to change educational systems to better accommodate students with disparate levels of ability.

While most of the article did not deal with race, Jensen did argue that it was “a not unreasonable hypothesis” that genetic differences between whites and blacks were an important cause of IQ and achievement gaps between the two races. This set off a huge academic controversy—Google Scholar says that the article was cited more than 1,200 times in the decade after its publication and almost 5,400 times by December 2019. The dispute about the article centered on the question of racial differences, which is understandable as Jensen’s thesis came out on the heels of the civil rights movement and its attendant controversies, such as school integration, busing of students, and affirmative action. Jensen questioned whether it is in fact possible to eliminate racial differences in socially valued outcomes through conventional policy measures, striking at the foundational assumption of liberal and radical racial politics. His floating of the racial-genetic hypothesis was what set his argument apart from the general tenor of the era’s scholarly and policy debate.

In this post, I will take a look at Jensen’s arguments and their development over time. The focus will be on the race question, but many related, more general topics will be discussed as well. The post has four parts. The first is a synopsis of Jensen’s argument as it was presented in the 1969 article. The second part offers an updated restatement of Jensen’s model of race and intelligence, while in the third part I argue, using the Bradford Hill criteria, that the model has many virtues as a causal explanation. In the fourth and concluding part I will make some more general remarks about the status and significance of racialist thinking about race and IQ.[Note]
Continue reading

Racial and Ethnic Differences in Cognitive Skills in Working-Age, Native-Born Americans

Given the central role that testing plays in the American educational system, most datasets that we have on racial and ethnic differences in cognitive ability include only children, adolescents, or young adults. Most of the economic and social effects of cognitive differences are, however, produced by the working age population, so it would be useful to have test scores from older adults as well. The PIAAC survey of adult skills conducted by the OECD provides excellent data for this purpose. Continue reading

Measurement Error, Regression to the Mean, and Group Differences

Regression to the mean, RTM for short, is a statistical phenomenon which occurs when a variable that is in some sense unreliable or unstable is measured on two different occasions. Another way to put it is that RTM is to be expected whenever there is a less than perfect correlation between two measurements of the same thing. The most conspicuous consequence of RTM is that individuals who are far from the mean value of the distribution on first measurement tend to be noticeably closer to the mean on second measurement. As most variables aren’t perfectly stable over time, RTM is a more or less universal phenomenon.

In this post, I will attempt to explain why regression to the mean happens. I will also try to clarify certain common misconceptions about it, such as why RTM does not make people more average over time. Much of the post is devoted to demonstrating how RTM complicates group comparisons, and what can be done about it. My approach is didactic and I will repeat myself a lot, but I think that’s warranted given how often people are misled by this phenomenon.
Continue reading

« Older posts

© 2022 Human Varieties

Theme by Anders NorenUp ↑