Gregory Connor and I submitted the paper, “Linear and partially linear models of behavioral trait variation using admixture regression,” to MDPI’s Behavioral Sciences. This is a methodological paper explicating & proposing some modifications to the frequently used – across hundreds of papers – admixture regression method. We illustrated this method and our proposed tweaks using the ABCD cohort. This manuscript was peer-reviewed by three reviewers, accepted, proof-edited, paid for, but not published. Breaking with MDPI’s clearly outlined protocol, the editor of Behavioral Sciences – who I am fairly sure has now blacklisted me — sent it to a mysterious and seemingly not particularly acute 4th “reviewer”. This “reviewer” argued that the paper was “racist” and based on an “outdated” method. We were not given a chance to respond. And the opinions of the original three reviewers, whom we patiently replied to and made revisions for, were discarded.

You might wonder whether this 4th “reviewer” caught a serious methodological error – or even a substantive one. Nope. Instead, he argued that admixture regression – frequently used, since the early 2000s by numerous geneticists, genetic epidemiologists, medical researchers, and so on – is an “outdated approach (more of the 19th century)”. He kept repeating that the paper was about an outdated “biological concept” of race, when it concerned the relation between traits, genetic ancestry, and self-identified race/ethnicity. To note, typical MDPI reviews are not this ill-conceived and incoherent.

To let you judge if this post-hoc “review” had any merit, I provided this full comment along with my point-by-point empty-chair reply. Since the paper already passed peer-review and was accepted by MDPI, but not published for obvious political reasons, Greg and I have decided to publish it as a chapter in a forthcoming book. I usually do not publish reviews. However since I do not plan to have this paper peer-reviewed yet again, publishing the post-hoc commentary is warranted. Moreover, I usually do not speculate on motives, but it should be noted that, according to the editor, our post-hoc commenter was a knowledgeable geneticist. That fact, with the implication that the commenter understood the technique and literature, suggests that this was a hit job, with the goal of simply persuading the editor to cancel the paper. On the other hand, the commentary does read as if the “reviewer” was either clueless or was just trying to rationalize moral outrage.

“Peer-review” #4.

R4: Connor and Fuerst (here, C&F) proposed a new test that measure how differences in racial identity affects trait variation. They apply their variable to neuropsychological data collected by the Adolescent Brain Cognitive Development (ABCD) study and report that there exists a genetic component to neuropsychological traits and that there is a variation in the performance between different racial groups.

Empty chair reply: As we clearly explained in the introduction, admixture regression is commonly used in genetic epidemiology. Over the last two decades, hundreds of papers have been published using this technique by hundreds of well published geneticists, genetic epidemiologists, medical researchers and so on. In this paper, we explicate the underlying statistical model and propose some improvements to this frequently used technique.

R4: I found this paper unfounded, misleading, dishonest, and outdated, i.e., racist.

Empty-chair reply: Did you get your 30 pieces of silver for this hit job?

R4: The authors are missing some important advances in the field of population genetics. They used outdated terms (races) and cite no literature to support their racial perception.

Empty chair reply: You clearly did not understand the paper. We explicitly contrasted self-identified race/ethnicity (SIRE) with genetic ancestry. The former is posited as tagging environmental effects while the latter is posited as tagging genetic effects: Thus, we note: “Admixture regression leverages these two data sources, self-identified race or ethnicity (SIRE) and genetically-measured admixture proportions, to decompose trait variation correspondingly.” In line with ASHG (2018) we contrast self-identified race/ethnicity, a social construct, with genetic ancestry, a genetic construct. As ASHG (2018) notes:

Although a person’s genetics influences their phenotypic characteristics, and self-identified race might be influenced by physical appearance, race itself is a social construct. Any attempt to use genetics to rank populations demonstrates a fundamental misunderstanding of genetics. The past decade has seen the emergence of strategies for assessing an individual’s genetic ancestry. Such analyses are providing increasingly accurate ways of helping to define individuals’ ancestral origins and enabling new ways to explore and discuss ancestries that move us beyond blunt definitions of self-identified race. [Emphasis added]

R4: Their assumptions about human races are from the previous century. They consistently imply that their usage of racial categories used in social sciences have genetic merit, that’s racism and, of course, wrong. It is not surprise that they cannot find papers to support their genetic model, because it is unfounded.

Empty chair reply: See above. Also, we cited a plethora of examples of papers using admixture regression in the introduction and conclusion.

R4: The authors model individuals as races + admixture, but the emphasis is on races, as admixture is simply defined as combination of more than 1 race. This is a very ignorant modelling of human populations that ignores the vast literature on the subject. The genetic analyses results are skewed to reproduce their perceived racist model.

Empty chair reply: No. Genetic ancestry is not a combination of more than one SIRE group. And there are literally hundreds of papers which employ admixture regression analysis using the same major ancestry groups we used. The ABCD consortium, itself, even has their own genetic ancestry variables (European, African, Amerindian, and East Asian ancestry). We only recomputed these so to include South Asian ancestry

R4: Throughout the manuscript, the authors omit results (i.e., graphs and code) necessary to evaluate their code.

Empty chair reply: We provided the code in the supplemental files. Either you did not check or the editors did not forward this to you.

R4: 1. Where is the support to: “Many diverse national populations descend demographically from isolated continental groups within a few hundred years.”? where did you get it from? where is the scientific reference? ancient DNA study show that mixture is the norm rather than the exception.

Empty chair reply: Admixture within continental groups obviously doesn’t preclude isolation between them.

R4: 2. “Modern genetic technology can measure with high accuracy the proportion of an individual’s ancestry associated with these continental groups.” – yes, modern tests can predict continental origins with high accuracy, but where is the citation?

Empty chair reply: This is from ASHG’s positional statement on this topic.

R4: 3. “In many culturally diverse nations, most individuals can reliably self-identify as members of one or more racial or ethnic groups.” – nonsense. All self-reports are biased. No serious study uses self report ancestry. Of course, the authors must believe in that, because their entire method rests on this connection, but it is untrue. Unlike this unsupported claim of the authors, there are plenty of papers that prove otherwise :
https://academic.oup.com/aje/article/163/5/486/61161?login=true
Self-reported ancestry may not be a reliable method to reduce the possible impact of population stratification in genetic association studies of outbred populations, such as in the United States.
https://pubmed.ncbi.nlm.nih.gov/8761246/
https://pubmed.ncbi.nlm.nih.gov/10797159/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2350912/
Read: https://www.nature.com/articles/s41408-018-0132-1 to see the differences between self-reported ancestry and genomic ancestry, calculated very accurately.

Empty chair reply: We did not say that SIRE is a reliable index of genetic ancestry – after all, the whole method is based on the contrast between SIRE and genetic ancestry. Rather, we said that SIRE is a reliable index of itself, in the sense that people who identify as a particular SIRE group at one time identify the same way at another. Thus it reliably tracks a cultural-environment.

R4: 4. Poor modeling: How can self-identified people report their % of ancestry? Hardly anyone mixed is 50%:50%.

Empty chair reply: How much did you bother to read beyond the abstract?

R4: 5. “The genotyped DNA samples are carefully decomposed into admixture proportions of geographic ancestry” – no. they are decomposed into a mixture of racial groups that the authors created after forcing the genetic data to show races. Races and admixture are two different concepts.

Empty chair reply: Translation: “The authors computed genetic ancestry in a standard way and entered this in a regression model with SIRE as have so many other researchers. This is bad: Reasons.”

R4: 6. “In most applications of admixture regression, individuals’ racial or ethnic group identities will have statistical relationships with individuals’ genetically identified geographic ancestries” – No! where is the evidence? Why this paper is completely devoid of reference for any fundamental assumption of the model. What does it mean “statistical relationships”?

Empty chair reply: Yes! Self-identified race/ethnicity generally, but imperfectly correlates with genetic ancestry. This just restates ASHG’s (2018) positional statement. But since you don’t even understand the meaning of “statistical relationship” what can one expect?

R4: 7. “The objective of admixture regression is to decompose trait variation into linear components due to genetic ancestries and linear components due to racial/ethnic group related effects” – unlike admixture mapping techniques, which the author misleading cite as a parallel method, their method is not designed to link a loci with a trait, but rather link conditions with races with a biological support to the racial concept.

Empty chair reply: Whew!… admixture regression analysis is not ‘our’ method. And this frequently used method is not “designed” to provide “biological support to the racial concept”: it explicitly takes advantage of social constructive aspects of racial identification in admixed populations. Do you need this point illustrated with a crayon?

R4: 8. “We show that the admixture regression model can be viewed as a statistically feasible simplification of this linear polygenic index model, in which proportional ancestries serve as statistical proxies for ancestry-related genetic differences.” – proportional ancestries serve as statistical proxies for ancestry-related genetic differences? You calculate ancestries from genetics, this statement means nothing. This is a tautology.

Empty chair reply: So now you finally realize that we used genetic ancestry. But, of course you are still wrong, since local ancestry is a subset of global ancestry. The statement reads: in our model, [global] ancestries serve as statistical proxies for [local] ancestry-related genetic differences.

R4: 9. “an assumption of random mating across ancestral populations” – really? where is the reference for this assumption?

Empty chair reply: Unsurprisingly, no other reviewers had a problem interpreting this statement. To spell it out: It is an assumption made by the theoretical model – thus a limitation – not an assumption about the world.

R4: 10. “A key assumption of the admixture regression model is that admixture arises from recent random mating between the previously geographically-isolated ancestral groups.” – of course no reference, because it is untrue. Your key assumption is not supported by reality.

Empty chair reply:… we restate that random mating is a theoretical assumption of the commonly used admixture regression model which may or may not be violated to a practically significant extent in the real world.

R4: 11. “Many individuals self-identify as belonging to two or more racial or ethnic groups” – you of course model those groups as RACES, by the biological definition, i.e., groups that are completely separate from one another and didn’t mix. Again, where is your evidence (from this century)? Surely you realize that the racial groups that you used do not satisfy this condition, south and east Asians are closer to each other than to Africans, but you ignore that. There are relationships between those groups, it’s not a star topology.

Empty chair reply: We explicitly do not model self-identified “racial or ethnic groups” as “groups that are completely separate from one another and didn’t mix”! If they didn’t mix, we wouldn’t have admixture for our admixture regression! Nowhere in this paper do we talk about “biological races”. We talk about “genetic ancestry” and SIRE. Perhaps you could try reading our actual paper…

R4: 12. The author removed 80% of the genetic data. They claim that they follow the instruction of ADMIXTURE, but there are no such instruction or recommendation.

Empty chair reply: 100,000 random SNPs…. 100,000 random SNPs…

R4:

13. They force the genetic data into 5 racial categories to fit their made up racial categories. They never show a single result of the genetic analyses. we don’t see the STRUCTURE analysis, nor the PCA. We don’t see the scripts that they used. They through populations because they are “overly admixed”?? what does it mean? You think that Hispanics are less admixed than Druze? Where is the evidence? Why everything in this manuscript is made up BS?

Empty chair reply: You mean: we use K=5 (European, Amerindian, African, East Asian, & South Asian) instead of the K=4 (European, African, Amerindian, & East Asian) provided by the National Institute of Health for the ABCD dataset… Yes, only “racists” would use these ancestry components.

R4: 14. The authors don’t report their results. Are they afraid? Where are the findings of the model (blacks are poor and uneducated, bla bla). What is the point of this paper if the authors don’t stand behind their results? Why should anyone believe in it?

Empty chair reply: So you missed the part that this was a methodological paper which then illustrated the methodology using the ABCD sample.

15. Where is the null hypothesis?

Empty chair reply: Whew!

R4: 16. I have major ethical concerns due to the extensive use of races, biologically defined. I think that it is wrong and unsupported by the data nor literature.

Empty chair reply: …so, again, we used SIRE vs. genetic ancestry. Which one, exactly, is the “wrong and unsupported” “races, biologically defined”?

R4: Minor comments 1. “It has particular value in the case of complex behavioral traits where reliably identifying genetic loci associated with trait variation is beyond the current reach of science” – so it is not beyond the reach of science?

Empty chair reply: Would you like it to be?

R4: I have a few more comments, but I think that the trend here is pretty obvious. It is an outdated approach (more of the 19th century).

Empty chair reply: Well, maybe you should tell that to the hundreds of research teams that currently use this method.