The Post-hoc 4th Review

Gregory Connor and I submitted the paper, “Linear and partially linear models of behavioral trait variation using admixture regression,” to MDPI’s Behavioral Sciences. This is a methodological paper explicating & proposing some modifications to the frequently used – across hundreds of papers – admixture regression method. We illustrated this method and our proposed tweaks using the ABCD cohort. This manuscript was peer-reviewed by three reviewers, accepted, proof-edited, paid for, but not published. Breaking with MDPI’s clearly outlined protocol, the editor of Behavioral Sciences – who I am fairly sure has now blacklisted me — sent it to a mysterious and seemingly not particularly acute 4th “reviewer”. This “reviewer” argued that the paper was “racist” and based on an “outdated” method. We were not given a chance to respond. And the opinions of the original three reviewers, whom we patiently replied to and made revisions for, were discarded.

You might wonder whether this 4th “reviewer” caught a serious methodological error – or even a substantive one. Nope. Instead, he argued that admixture regression – frequently used, since the early 2000s by numerous geneticists, genetic epidemiologists, medical researchers, and so on – is an “outdated approach (more of the 19th century)”. He kept repeating that the paper was about an outdated “biological concept” of race, when it concerned the relation between traits, genetic ancestry, and self-identified race/ethnicity. To note, typical MDPI reviews are not this ill-conceived and incoherent.

To let you judge if this post-hoc “review” had any merit, I provided this full comment along with my point-by-point empty-chair reply. Since the paper already passed peer-review and was accepted by MDPI, but not published for obvious political reasons, Greg and I have decided to publish it as a chapter in a forthcoming book. I usually do not publish reviews. However since I do not plan to have this paper peer-reviewed yet again, publishing the post-hoc commentary is warranted. Moreover, I usually do not speculate on motives, but it should be noted that, according to the editor, our post-hoc commenter was a knowledgeable geneticist. That fact, with the implication that the commenter understood the technique and literature, suggests that this was a hit job, with the goal of simply persuading the editor to cancel the paper. On the other hand, the commentary does read as if the “reviewer” was either clueless or was just trying to rationalize moral outrage.

“Peer-review” #4.

R4: Connor and Fuerst (here, C&F) proposed a new test that measure how differences in racial identity affects trait variation. They apply their variable to neuropsychological data collected by the Adolescent Brain Cognitive Development (ABCD) study and report that there exists a genetic component to neuropsychological traits and that there is a variation in the performance between different racial groups.

Empty chair reply: As we clearly explained in the introduction, admixture regression is commonly used in genetic epidemiology. Over the last two decades, hundreds of papers have been published using this technique by hundreds of well published geneticists, genetic epidemiologists, medical researchers and so on. In this paper, we explicate the underlying statistical model and propose some improvements to this frequently used technique.

R4: I found this paper unfounded, misleading, dishonest, and outdated, i.e., racist.

Empty-chair reply: Did you get your 30 pieces of silver for this hit job?

R4: The authors are missing some important advances in the field of population genetics. They used outdated terms (races) and cite no literature to support their racial perception.

Empty chair reply: You clearly did not understand the paper. We explicitly contrasted self-identified race/ethnicity (SIRE) with genetic ancestry. The former is posited as tagging environmental effects while the latter is posited as tagging genetic effects: Thus, we note: “Admixture regression leverages these two data sources, self-identified race or ethnicity (SIRE) and genetically-measured admixture proportions, to decompose trait variation correspondingly.” In line with ASHG (2018) we contrast self-identified race/ethnicity, a social construct, with genetic ancestry, a genetic construct. As ASHG (2018) notes:

Although a person’s genetics influences their phenotypic characteristics, and self-identified race might be influenced by physical appearance, race itself is a social construct. Any attempt to use genetics to rank populations demonstrates a fundamental misunderstanding of genetics. The past decade has seen the emergence of strategies for assessing an individual’s genetic ancestry. Such analyses are providing increasingly accurate ways of helping to define individuals’ ancestral origins and enabling new ways to explore and discuss ancestries that move us beyond blunt definitions of self-identified race. [Emphasis added]

R4: Their assumptions about human races are from the previous century. They consistently imply that their usage of racial categories used in social sciences have genetic merit, that’s racism and, of course, wrong. It is not surprise that they cannot find papers to support their genetic model, because it is unfounded.

Empty chair reply: See above. Also, we cited a plethora of examples of papers using admixture regression in the introduction and conclusion.

R4: The authors model individuals as races + admixture, but the emphasis is on races, as admixture is simply defined as combination of more than 1 race. This is a very ignorant modelling of human populations that ignores the vast literature on the subject. The genetic analyses results are skewed to reproduce their perceived racist model.

Empty chair reply: No. Genetic ancestry is not a combination of more than one SIRE group. And there are literally hundreds of papers which employ admixture regression analysis using the same major ancestry groups we used. The ABCD consortium, itself, even has their own genetic ancestry variables (European, African, Amerindian, and East Asian ancestry). We only recomputed these so to include South Asian ancestry

R4: Throughout the manuscript, the authors omit results (i.e., graphs and code) necessary to evaluate their code.

Empty chair reply: We provided the code in the supplemental files. Either you did not check or the editors did not forward this to you.

R4: 1. Where is the support to: “Many diverse national populations descend demographically from isolated continental groups within a few hundred years.”? where did you get it from? where is the scientific reference? ancient DNA study show that mixture is the norm rather than the exception.

Empty chair reply: Admixture within continental groups obviously doesn’t preclude isolation between them.

R4: 2. “Modern genetic technology can measure with high accuracy the proportion of an individual’s ancestry associated with these continental groups.” – yes, modern tests can predict continental origins with high accuracy, but where is the citation?

Empty chair reply: This is from ASHG’s positional statement on this topic.

R4: 3. “In many culturally diverse nations, most individuals can reliably self-identify as members of one or more racial or ethnic groups.” – nonsense. All self-reports are biased. No serious study uses self report ancestry. Of course, the authors must believe in that, because their entire method rests on this connection, but it is untrue. Unlike this unsupported claim of the authors, there are plenty of papers that prove otherwise :
https://academic.oup.com/aje/article/163/5/486/61161?login=true
Self-reported ancestry may not be a reliable method to reduce the possible impact of population stratification in genetic association studies of outbred populations, such as in the United States.
https://pubmed.ncbi.nlm.nih.gov/8761246/
https://pubmed.ncbi.nlm.nih.gov/10797159/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2350912/
Read: https://www.nature.com/articles/s41408-018-0132-1 to see the differences between self-reported ancestry and genomic ancestry, calculated very accurately.

Empty chair reply: We did not say that SIRE is a reliable index of genetic ancestry – after all, the whole method is based on the contrast between SIRE and genetic ancestry. Rather, we said that SIRE is a reliable index of itself, in the sense that people who identify as a particular SIRE group at one time identify the same way at another. Thus it reliably tracks a cultural-environment.

R4: 4. Poor modeling: How can self-identified people report their % of ancestry? Hardly anyone mixed is 50%:50%.

Empty chair reply: How much did you bother to read beyond the abstract?

R4: 5. “The genotyped DNA samples are carefully decomposed into admixture proportions of geographic ancestry” – no. they are decomposed into a mixture of racial groups that the authors created after forcing the genetic data to show races. Races and admixture are two different concepts.

Empty chair reply: Translation: “The authors computed genetic ancestry in a standard way and entered this in a regression model with SIRE as have so many other researchers. This is bad: Reasons.”

R4: 6. “In most applications of admixture regression, individuals’ racial or ethnic group identities will have statistical relationships with individuals’ genetically identified geographic ancestries” – No! where is the evidence? Why this paper is completely devoid of reference for any fundamental assumption of the model. What does it mean “statistical relationships”?

Empty chair reply: Yes! Self-identified race/ethnicity generally, but imperfectly correlates with genetic ancestry. This just restates ASHG’s (2018) positional statement. But since you don’t even understand the meaning of “statistical relationship” what can one expect?

R4: 7. “The objective of admixture regression is to decompose trait variation into linear components due to genetic ancestries and linear components due to racial/ethnic group related effects” – unlike admixture mapping techniques, which the author misleading cite as a parallel method, their method is not designed to link a loci with a trait, but rather link conditions with races with a biological support to the racial concept.

Empty chair reply: Whew!… admixture regression analysis is not ‘our’ method. And this frequently used method is not “designed” to provide “biological support to the racial concept”: it explicitly takes advantage of social constructive aspects of racial identification in admixed populations. Do you need this point illustrated with a crayon?

R4: 8. “We show that the admixture regression model can be viewed as a statistically feasible simplification of this linear polygenic index model, in which proportional ancestries serve as statistical proxies for ancestry-related genetic differences.” – proportional ancestries serve as statistical proxies for ancestry-related genetic differences? You calculate ancestries from genetics, this statement means nothing. This is a tautology.

Empty chair reply: So now you finally realize that we used genetic ancestry. But, of course you are still wrong, since local ancestry is a subset of global ancestry. The statement reads: in our model, [global] ancestries serve as statistical proxies for [local] ancestry-related genetic differences.

R4: 9. “an assumption of random mating across ancestral populations” – really? where is the reference for this assumption?

Empty chair reply: Unsurprisingly, no other reviewers had a problem interpreting this statement. To spell it out: It is an assumption made by the theoretical model – thus a limitation – not an assumption about the world.

R4: 10. “A key assumption of the admixture regression model is that admixture arises from recent random mating between the previously geographically-isolated ancestral groups.” – of course no reference, because it is untrue. Your key assumption is not supported by reality.

Empty chair reply:… we restate that random mating is a theoretical assumption of the commonly used admixture regression model which may or may not be violated to a practically significant extent in the real world.

R4: 11. “Many individuals self-identify as belonging to two or more racial or ethnic groups” – you of course model those groups as RACES, by the biological definition, i.e., groups that are completely separate from one another and didn’t mix. Again, where is your evidence (from this century)? Surely you realize that the racial groups that you used do not satisfy this condition, south and east Asians are closer to each other than to Africans, but you ignore that. There are relationships between those groups, it’s not a star topology.

Empty chair reply: We explicitly do not model self-identified “racial or ethnic groups” as “groups that are completely separate from one another and didn’t mix”! If they didn’t mix, we wouldn’t have admixture for our admixture regression! Nowhere in this paper do we talk about “biological races”. We talk about “genetic ancestry” and SIRE. Perhaps you could try reading our actual paper…

R4: 12. The author removed 80% of the genetic data. They claim that they follow the instruction of ADMIXTURE, but there are no such instruction or recommendation.

Empty chair reply: 100,000 random SNPs…. 100,000 random SNPs…

R4:

13. They force the genetic data into 5 racial categories to fit their made up racial categories. They never show a single result of the genetic analyses. we don’t see the STRUCTURE analysis, nor the PCA. We don’t see the scripts that they used. They through populations because they are “overly admixed”?? what does it mean? You think that Hispanics are less admixed than Druze? Where is the evidence? Why everything in this manuscript is made up BS?

Empty chair reply: You mean: we use K=5 (European, Amerindian, African, East Asian, & South Asian) instead of the K=4 (European, African, Amerindian, & East Asian) provided by the National Institute of Health for the ABCD dataset… Yes, only “racists” would use these ancestry components.

R4: 14. The authors don’t report their results. Are they afraid? Where are the findings of the model (blacks are poor and uneducated, bla bla). What is the point of this paper if the authors don’t stand behind their results? Why should anyone believe in it?

Empty chair reply: So you missed the part that this was a methodological paper which then illustrated the methodology using the ABCD sample.

15. Where is the null hypothesis?

Empty chair reply: Whew!

R4: 16. I have major ethical concerns due to the extensive use of races, biologically defined. I think that it is wrong and unsupported by the data nor literature.

Empty chair reply: …so, again, we used SIRE vs. genetic ancestry. Which one, exactly, is the “wrong and unsupported” “races, biologically defined”?

R4: Minor comments 1. “It has particular value in the case of complex behavioral traits where reliably identifying genetic loci associated with trait variation is beyond the current reach of science” – so it is not beyond the reach of science?

Empty chair reply: Would you like it to be?

R4: I have a few more comments, but I think that the trend here is pretty obvious. It is an outdated approach (more of the 19th century).

Empty chair reply: Well, maybe you should tell that to the hundreds of research teams that currently use this method.

6 Comments

  1. WalterS

    Been following your articles for a while, and came across this paper regarding the use of racial groupings -paper arguing against them-
    https://onlinelibrary.wiley.com/doi/10.1002/bies.202100204
    “Lewontin did not commit Lewontin’s fallacy, his critics do: Why racial taxonomy is not useful for the scientific study of human variation” by C. Roseman (2021)

    Unsure where else to post, so doing so here. Wondering what you make of its arguments.

    • Chuck

      You would have to post the pdf if you want an evaluation. Right off the bat, though, I see an error in the abstract e.g., “Lewontin concluded that it follows from the fact that the large majority of human genetic variation (≈ 85%) is among individuals within local populations…” Lewontin concluded that 85% of the genetic variance was within populations; however, since we are diploids, only roughly half of the within-population variance is between individuals within populations. Needless to say, obvious errors like this, in the abstract no less, don’t give me confidence in the paper or the journal in which it was published.

      That said, I can’t think of a passage in which Lewontin made the “Lewontin Fallacy” as it is commonly construed. (Though he made others.) However, many have while citing him. Thus, I think it is not unreasonable to refer to it as the “Lewontin Fallacy” (as opposed to “Lewontin‘s fallacy”). I commented on this in NofR pp. 133-134.

      • WalterS

        Thanks for the reply, it is hard to post the pdf unfortunately but I found this segment from the paper online:
        Edwards proposed a scenario in which there are two groups with different frequencies of two alleles at varying numbers of loci. At each locus, a focal allele is set to a frequency of 0.3 in one population and 0.7 in the other. He did not specify how the allele frequencies in his example populations evolved from a common ancestral population. Suppose the ancestral population of Edwards’ two groups had frequencies of the two alleles set to 0.5 at each locus and all loci evolved in lockstep to their frequencies in either population. The evolutionary trajectories of the groups are anti-correlated because the groups are evolving at the same rate in opposite directions. The evolutionary trajectories of alleles within each population are correlated as they all evolve in lockstep from one common ancestral state to one uniform state in each lineage as though they were under strong natural selection.

        This does not resemble Lewontin’s study in the least. Lewontin’s loci are spread out over the genome. Some are within a few dozens of millions of base pairs on the same chromosome, but few loci seem to be all that close to others and all are probably evolutionarily independent enough for his purposes. Moreover, while natural selection has undoubtedly caused evolution in at least some of Lewontin’s loci, it is exceedingly unlikely that they were being evolved by selection in the same way as implied in Edwards’ example. Lewontin’s data more closely approximate the assumptions of our model of randomly evolving independent loci that have correlations among their trajectories only insofar as populations might share common ancestry than they do Edwards’ extreme example.

        Thus

        It is easy to find statistically robust intraclass correlations using groups that bear no resemblance to the patterns of gene flow, random genetic drift, and common ancestry that structured the observed variation. Traditional racial groupings are arbitrary and merely coincide with a small aspect of human variation.[3, 4] They do not reflect the true patterns of gene flow and common descent among populations that structured the observed variation.

        I am sorry in advance for the wall of text but I don’t quite understand Roseman’s point here. I feel it is biased but is distributed a lot and apparently many people are taking it seriously. It seems the author is basically implying that Edwards was bringing up a theoretical example while Lewontin was reporting on the “reality” so Edwards’ critique does not hold up in practice.

        • Chuck

          “Traditional racial groupings are arbitrary and merely coincide with a small aspect of human variation.[3, 4] They do not reflect the true patterns of gene flow and common descent among populations that structured the observed variation.”

          I don’t know what “Traditional racial groupings” he has in mind, but what Q. Spencer calls the Blumenbach partition corresponds quite nicely with “patterns of gene flow and common descent” And this is more or less what USA-based census races are based on. See:

          Spencer, Q. (2014). A radical solution to the race problem. Philosophy of Science, 81(5), 1025-1038.

          See also, NofR section IV, IV. The Races of Man. Compare clarity of exposition.

          So then the question is as to the amount of average differences. I discussed that in NofR and more recently in Suppl. File 6 here: https://www.researchgate.net/publication/354767078_More_Research_Needed_Suppl. file 6 (see also 5). Lewontin was clearly wrong insofar as he meant that the between-group variance is of little social significance. (You can just plug 15% variance into equation (1) to see.)

          • Chuck

            A simple way to think about this is that a 15% between groups variance is a large effect size in the social sciences. See: https://en.wikiversity.org/wiki/Eta-squared
            This general notion is what Sewall Wright based his often cited Fst criteria on (which only applied to bi-allelic variants):

            “0 to 0.05 indicates little genetic differentiation.
            “0.05 to 0.15 indicates moderate genetic differentiation.
            0.15 to 0.25 indicates great genetic differentiation.
            0.25 indicate very great genetic differentiation.
            (Wright, S. 1978. Evolution and the genetics of populations)”

  2. WalterS

    Interesting, thank you. I will follow up with the NofR also.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2021 Human Varieties

Theme by Anders NorenUp ↑