Measurement Error, Regression to the Mean, and Group Differences

Regression to the mean, RTM for short, is a statistical phenomenon which occurs when a variable that is in some sense unreliable or unstable is measured on two different occasions. Another way to put it is that RTM is to be expected whenever there is a less than perfect correlation between two measurements of the same thing. The most conspicuous consequence of RTM is that individuals who are far from the mean value of the distribution on first measurement tend to be noticeably closer to the mean on second measurement. As most variables aren’t perfectly stable over time, RTM is a more or less universal phenomenon.

In this post, I will attempt to explain why regression to the mean happens. I will also try to clarify certain common misconceptions about it, such as why RTM does not make people more average over time. Much of the post is devoted to demonstrating how RTM complicates group comparisons, and what can be done about it. My approach is didactic and I will repeat myself a lot, but I think that’s warranted given how often people are misled by this phenomenon.
Biogeographic ancestry and endophenotypes, etc.

There are a couple of new, well designed, obtainable, surveys out — with ancestry, MRI, and cognitive data – which should allow for the (dis)confirmation of certain conjectures of ill repute:

–Neurodevelopmental Genomics: Trajectories of Complex Phenotypes (age 8-21, N ~ 10,000)
–The Brain Genomics Superstruct Project (age 18-35, N ~ 1,500)

For example, Greg Cochran likes to go on about how major ancestry groups often differ in crude brain morphology, and how these differences probably explain a significant chunk (> 20%) of bio-ancestry related differences in CA. I doubt it. But if he agrees to specify the analytic strategy, I will try to get the data and run the analyses.

I did look through the PING survey (age 3-21, N ~ 1,500) – which might not be very informative owing to the age structure. Going by this, Greg seems to be more or less correct about some of the endo differences and probably about their origins. As an example, Figure 1 & 2 show the B/W diffs for intracranial and total brain volume by age. (AAs are picked out for illustration since they are the largest non-White ethnic group, showing the biggest deviation from Whites.) And Figure 3 shows the relation between brain volume and ancestry in the self-identified AA group; the results were basically the same for intracranial volume, etc. — and so not shown.

Yet, as seen in Table 1 &  2, CA was more or less uncorrelated with these particular endophenotypes (r = 0.07-0.08); unsurprisingly, CA explained virtually no endo differences, and vice versa. Yet, CA was strongly (negatively) associated with both African and Amerindian ancestry – and also, though to a lesser degree, with Oceanian.

Perhaps there is a more sound way to run the numbers? Or a better way to take into account age? Dunno, it’s not my position to defend.

Results below.
Affirmative action in Brazil: Let all apply and 23andME sort them out

Thought criminal extraordinaire, Steve Sailer commented recently on Foreign Policy’s article, “Brazil’s New Problem With Blackness.” Money quotes:

These policies didn’t eliminate race, but they did affect how it came to be classified. The marker of race drifted away from a binary consideration of a person’s ancestry and became increasingly based on one’s appearance.

What ultimately binds these definitions together is an awareness that the less “black” a person looks, the better — better for securing jobs, better for social mobility. The widespread acceptance of multiracial identities in Brazil coexists with steep racial inequality — a contradiction that the sociologist Edward E. Telles has called “the enigma of Brazilian race relations.”

Eleven experts comprised the panel, among them UFPel administrators, anthropologists, and leaders in the wider black community of Pelotas. They received strict guidelines from the Public Prosecutors Office: “Phenotypical characteristics are what should be taken into account,” read the instructions. “Arguments concerning the race of one’s ancestors are therefore irrelevant.”

And this:

“In Aug. 2016, after it had become clear that the law left room for fraud, the government ordered all departments to install verification committees. But it failed to provide the agencies with any guidance.

The Department of Education in Para, Brazil’s blackest state, attempted to fulfill the decree with a checklist, which leaked to the press. Among the criteria to be scored: Is the job candidate’s nose short, wide and flat? How thick are their lips? Are their gums sufficiently purple? What about their lower jaw? Does it protrude forward? Candidates were to be awarded points per item, like “hair type” and “skull shape”

But black activists say such measures are unavoidable.

When you allow your national policies to be guided by sociological theories, like those of Telles, you are bound to run into this type of mess.

Below are regression results, based on the Pelotas Birth Cohort (n = ~ 2850), for genetic racial ancestry, interviewer and interviewee-reported color (corr), and three SES indicators. In this 1982 birth cohort, independent of European ancestry, it can be seen that there is no consistent negative association between interviewer rated “black” appearance and outcomes. That is, in Brazil, the average race of one’s ancestors is more relevant than stereotypical race-associated phenotypic characteristics. (Note: the sample sizes for “Yellow” and “Indigenous” were small, so those estimates are fairly unreliable; also, neither an East Asian nor Amerindian ancestry component was included).

Image 11(Source: F. Hartwig, personal communication, March 4, 2016; full results)

So, if one is interested in addressing historic race-related inequalities, it would be more efficient and just — since (dis)advantages are mostly being passed along lines of descent — to positively discriminate according to objectively determinable biogeographic ancestry, not subjectively assessed stereotypical racial appearance. And it’s hard to see how requiring 23andMe reports would be more intrusive than having a 12 member panel examine applicants for nose width, lip thickness, craniofacial morphology, etc. to see if they are sufficiently African-looking.

Of course, this isn’t going to happen any time soon, since the conclusion that ancestry with respect to major racial or descent groups is relevant to social outcome needs to be evaded, even at the expense of good science and quality social policy.

Biogeographic Ancestry and Socioeconomic Outcomes in the Americas

Kirkegaard, E.O.W., Wang, M., & Fuerst, J. (2017). Biogeographic Ancestry and Socioeconomic Outcomes in the Americas: A Meta-Analysis. The mankind quarterly, 573(3):398-427

It took a particularly long time to publish, owing to the shenanigans we ran into. For example, the editorial board of Frontiers in Genetics reversed their decision (September 12, 2016; affirmed: October 12, 2016) two-three months after deciding to accept with “moderate revision” (July 5, 2016) and mid-review on the grounds that a request from a reviewer “was not satisfactorily met.” What specific request did we brazenly question?

Reviewer 1, round 1: “The discussion of cognitive ability differences across SIREs feels out of place and innappropriate. This paper makes no attempt whatsoever to investigate cognitive abilities, and this discussion should be removed.”

Reply to reviewer 1: “Following the advice of another reviewer [who approved the paper] we added a diagram (Figure 8) to clarify the relevant discussion. Since that reviewer asked for a model and since cognitive ability seems like a plausible pathways to us, we feel that it would be intellectually dishonest on our part to not include the variable. The reason for the present reviewers objection is not clear to us. We do not investigate colorism, yet no objection is made regarding our mentioning of this as a potential mediator of the BGA x SES associations…”

They should have let it slide, because now we feel obliged to prove the point. And prove it again and again, if needs be.

Inquiries into fake history: The retconning of Frank Livingstone’s (1962) decidedly anti-Darwinian “there are no races, there are only clines” argument

According to one popular version of the race narrative, natural scientific concepts of race were conceptualized, in the 18th to early 20th century, such to imply discrete categories; and human “races,” were, accordingly, imaged to have significant discontinuities until post-World War II anthropologists, such as Frank Livingstone (1962), came to realize that human variation was relatively continuous, a fact which, the story goes, demonstrated that human “races,” as traditionally understood, did not exist — instead, only “populations” do. Anyone sufficiently familiar with actual 18th to early 20th century discourses on the matter, would find this tale outlandish. They would recall, for example, Wallace’s (1864) account, in “The origin of human races and the antiquity of man deduced from the theory of natural selection,” of the competing pre-evolutionary views about human variation:
Inquiries into fake history: Antoine Duchesne (1766) and Georg Forster (1786) on “race” in context to natural history

Science is replete with fake, whiggish histories peddled to bolster new paradigms. Biology is no exception. Two examples are the The Essentialism Story (Winsor, 2006; Richards, 2010; Wilkins, 2013) and The Classic View/The Mutationism Myth (Stoltzfus, 2010; Stoltzfus and Cable, 2014). According to the former, early biologists were inexplicably caught in the thrall of Platonic-Aristotelian typological essentialism, which resulted in the failure to recognize the significance of individual variation and which consequently retarded the recognition of evolution. According to the second, early geneticists were caught in the grip of saltationism, which resulted in the rejection of natural selection and held back for decades the synthesis between Mendelian principles and evolutionary theory. As expected, in these tellings, the actual historical views are often barely recognizable.

The most prominent, yet least discussed, example of pseudohistory of science has to be what should be called The Race Narrative. The Race Narrative is meta-myth, comprised of several related tales, which typically involve some permutation of: “”Race” never described a classification which had a proper place in natural history or a classification which, given how it was historically understood, was applicable to humans, but rather was a political construct imposed to oppress certain human groups, which was then back rationalized by natural historians, who read reality through the political ideology of their times.” Often The Essentialism Story and, to a lesser extent, The Mutationism Myth are incorporated into this Narrative.

To give just a few examples:
“Communities of descent”

After some deliberation, I determined that the locution “race” is inessential. As such, I now disavow the arguably controversial view that this is necessary when it comes to understanding biological variation, human and otherwise…. At the same time, it has become evident that the concept which I refer to as “lineage population,” which I hitherto called “natural division,” a concept which corresponds with Darwin’s “communities of descent,” as understood in On the origin of species by means of natural selection, or the preservation of favoured races is indispensible. I clarify that concept and discuss it in relation to other systematic ones.

Lineage population: A concept needed by an observer of nature? Preprint.

Abstract: The genealogy-based classificatory programs of Kant and Darwin are briefly discussed for context. It is detailed how in biology there is no unambiguous term to reference infraspecific-level descent-based divisions. The term lineage population is introduced and defined for analytic purposes: a lineage population is one of a set of divisions of intrafertile organisms into which members are arranged by propinquity of descent. It is argued that the lineage population concept avoids the ambiguities associated with related biological and anthropological concepts and polysemes such as population, ethnicity, and race. Other terms and concepts, such as form, cline, cluster, geographic population, breeding population, genetic population, breed, species, subspecies, ancestry, geographic ancestry, biogeographic ancestry, ancestral population, ancestry population, natural division, and population lineage, are discussed in relation to this concept. It is concluded that the lineage population concept is a useful analytic tool which picks out, in line with the Kantian/Darwinian perspective, an interesting class of biological variation.

New MQ paper

Kirkegaard, E. O. W. & Fuerst, J. (2016). Inequality in the United States: Ethnicity, Racial Admixture and Environmental Causes. Mankind Quarterly 56(4).

Previously, we looked at the association between overall state-level biogeographic ancestry (BGA) and overall state-level outcomes. It was found that European BGA relative to African and Amerindian BGA was associated with better outcomes. In this paper, the analysis is extended by looking at the state-level ancestry-outcome associations individually for black and Hispanic self-identified race-ethnicity (SIRE) groups. General socioeconomic factor (S) scores were calculated for US states by SIRE groups based on three indicators. The S factor loadings were generally stable across subgroup analyses and the factor scores were stable across factor analytic extraction methods (for the latter, almost all r’s ≈ 1). For Whites, Blacks and Hispanics, there were strong correlations between cognitive ability scores and S factor scores across states (r = .55 to .78; N = 28-50). This pattern also held when all data were analyzed together (r = .86, N = 115). Furthermore, the size of the Hispanic-White and Black-White S and cognitive ability gaps strongly correlated across states (r = .62 to .69; N = 36-37). Lastly, parasite prevalence did not plausibly explain SIRE gaps in cognitive ability because gaps were smaller in more parasite-rich states (combined analysis r = -.17, N = 91). We found that climatic and geospatial variables did not correlate strongly with cognitive ability and S scores when scores were decomposed by SIRE group, but did so at the total state level, even after statistically controlling for SIRE composition.

Top Ten Human Varieties Posts

In the more than three years of its existence, about 110 posts have been published on this blog. While blogging has unfortunately been light in recent times around here, the upside of the data- and analysis-heavy format of our posts is that they rarely lose their relevance with time, making the perusal of our old posts well worth the time.

To help readers search through our archives, below is a list of what I consider to be some of the best content we’ve published. They’re not necessarily our most popular posts, but I think they offer a good dive into human biodiversity, in particular our perennial favorite topic of IQ differences between groups. The list is in the order of original publication. Continue reading

Into the IQ shredder

Wang, M., Fuerst, J., Ren, J. (2016). Evidence of dysgenic fertility in China. Intelligence, 57, 15-24.

From the discussion: “We’ve seen, in Table 4, that urban populations in China exhibited a relatively high dysgenic fertility trend in the 1951–1970 birth cohort. For this same cohort, the trend was much smaller in the rural populations. It suggests that dysgenic selection is related to urbanity. This supports Pan’s (1923) observation that “modern urbanization has had so many dysgenic effects upon the race.”

