Racial Ancestry in the Americas. Part 1: National Genomic Racial Admixture: Estimates and Validation

It has been noted that in the Americas racial identification and genomic racial ancestry frequently don’t well correspond. In Latin America, the association seems to be modest on the individual level. For example, Ruiz-Linares et al. (2014) found a correlation of 0.48 between self-identified European and Amerindian racial identity and genomic ancestry in a five country sample. In principle, the same could hold on the aggregate national level. And in some instances there’s a clear discordance. While the Argentinean and Brazilian national populations have roughly the same degree of pre-1500 European ancestry, Argentine has a White European national image while Brazil has a multiracial one. One might wonder, then, to what extent average racial self-identification concords with average racial admixture on the national level. This is an interesting question and others in a similar vein can be asked. For example: To what degree are differences in national racial identification related to such and such outcomes independent of genomic ancestry? Perhaps, for example, members of countries with a more European identity act in aggregate different than ones that have developed, net of genotype, a less European one — an acting White effect on the national level. Ruiz-Linares et al. (2014) found that, on the individual level, White identity was associated with wealth (but not educational attainment) net of European ancestry (see note 1). If such a pattern can exist on the level of the individual, it could so on the level of the nation. Here, the first matter will be explored. I first present several indexes of national ancestry for the Americans; these include: national genomic percents, aggregate self- identified race percents, Putterman’s ancestry percents, and national skin reflectance scores. For comparability, these values are expressed in terms of major racial categories e.g., White European, Black African, and Amerindian — plus an “Other” group. I then use correlation analysis to validate these estimates.

Genomic Ancestry (variables: Eugenomic, Afrgenomic, Amergenomic): Average genomic ancestry percents were created for the 36 American nations for which admixture data was available. Most Admixture studies decomposed geographic ancestry into three components (European, African, and Amerindian). For some, a significant fraction of the population had another regional ancestral component (e.g., South Asian, East Asian, or Oceanian). As such, an “Other” category was included. Not all possible studies were used in creating the averages. Rather, estimates from the most methodologically sound and nationally representative studies were. Roughly 70 different estimates were employed in creating the 36 national ones. For some countries up to four sets of estimates were averaged while for others only one was available. The results are shown in Table 1. For Belize and Paraguay no regional or national level data was available; estimates instead were calculated based on those of the surrounding nations; this was justified given the migration histories of the countries and ancillary facts. For Trinidad and Tobago data was available only for the Black population, which constitutes approximately 40% of the total. Estimates were made based on self reported ethnicity and reasonable assumptions given the known admixture in the Black population. For the Virgin Islands, again, admixture data was only available for the Black population, which constitutes 76% of the population. National level estimates were made on the assumption that the White population, which comprises 16% of the total, was fully European and that the mixed /other ethnic population, which comprises 8% of the total, was half European and half African. For the U.S., data was only available for specific ethnic populations (e.g., African Americans, Hispanics, Whites, and Native Americans). National level estimates were created by weighting these by the percent of individuals who identified with each ethnic category. Asians (~4.5% of the population) were treated as 100% Other. Pacific Islanders and Mixed race individuals (~1.5%) were discounted. For Canada, the national estimate was made using U.S. ethnic admixture percents in conjunction with Canadian ethnic identity percents. Computations and sources are provided in the excel file. To make rates more comparable across countries, national admixture was expressed also in terms of the three main source populations: European/ West Caucasian, African, and Amerindian. To note, Middle Eastern and North African ancestry components were also generally lumped with the “European” one.

Table 1: Genomic Ancestry Estimates by Nation



(In black, estimates were reported in sources and averaged; in blue, average estimates were partially estimated based on self identified ethnic rates in conjunction with admixture results; in red, ancestry was estimated based on that of adjacent nations.)

Self-Identified Race (variables: IDCIAWhite, IDCIABlack, IDCIAAmer): Percent self-reported ethnicity and race as given by the CIA World Factbook was used to create national racial identification (ID) averages, except in the case of Canada, in which case the 2011 Canadian census data was used. As with genomic ancestry, European, African, Amerindian, and Other percents were computed. Specific ethnic groups such as “Spanish” or “Aymara” were grouped into regional racial identities. For hybrid identities such as Mestizo and Mulatto percents were split by parental group e.g., one half European and one half Amerindian. For tribrid identities such as Montubio, percents were split three ways. Assumptions had to be made for a number of nations. For example, Costa Rica was said to be 83.6% “White and Mestizo”; this was treated as 83.6 percent Mestizo (that is, as 41.8 percent European and 41.8 percent Amerindian). St. Lucia was said to be 85.3% Black, 3.9% White, and 10.9% Mixed; it was assumed that the “mixed” group was mixed Black and White i.e., Mulatto; thus, the African identity component was 85.3 + 1/2 *(10.9) and the European component was 3.9%+1/2*(10.9). Judgment calls such as these were noted in the excel file. Again, to make estimates more comparable across countries, national racial identities were also expressed in terms of the three main races: European/ West Caucasian, Amerindian, and African.

Table 2. Racial identity by Nation


Putterman and Weil’s World Migration Matrix (variables: PuttermanEU, PuttermanAfr, PuttermanAmer): Ancestry components were also computed based on Putterman and Weil’s ancestry matrix for 165 countries. For each nation, the matrix gives the percent of ancestors hailing from every nation in the year 1500. Putterman and Weil based their estimates on a mix of genetic studies, immigration data, and other sources. As above, four ancestral components were created: European, African, Amerindian, and other (including Middle Easterner and North African). This was done by summing the year 1500 national ancestry components into the four mentioned broad categories. Again, to make scores more comparable across countries, ancestral components were also expressed in terms of the three main racial groups.

Table 3. Putterman’s Ancestral component by Nation


Skin Reflectance (variable: SkinRefl):  National skin reflectance data was provided by Gerhard Meisenberg (Personal Communications, 2014). It has previously been used in a number of analyses e.g., Meisenberg and Woodley (2013).   For this variable, higher values correspond with  lighter skin color.

Table 4. National Skin Reflectance Scores


The (most recent) data file can be found here.

Method: Correlation analyses were run. Since nations differed wildly in population sizes (e.g., Cayman Islands Pop = 56,732; Brazil Pop = 202,656,788) and since the oddity of comparing countries that vary by up to four orders of magnitude in size has been pointed out (e.g., Hunt and Sternberg, 2006), weights were created by taking the square root of the population size. Weighted correlations are presented in the tables under the diagonal (in blue), unweighted above. All data is made available in case some wish to employ alternative methods.

Results: Results are shown in figures 1 through 3 below. European, African, and Amerindian genomic estimates strongly correlate with estimates based on racial identification and on Putterman and Weil’s ancestry matrix. As expected, White/European ancestry is a strong positive predictor of national reflectance, while Black/African ancestry is a strong negative one.

Figure 1: Correlations for European/White


Figure 2: Correlations for African/Black


Figure 3: Correlations for Amerindian


Overall, the results establish the validity of the ancestry estimates. They also establish that there is a high correspondence between genomic ancestry and average racial identification on the national level.


1.  In their supplementary file (s2), Ruiz-Linares, et al. (2014) report a highly significant association between European ancestry and both wealth and education (r= 0.12, p-value <2.2×10-16) . Net of genotype, wealth and education was not associated with African or Amerindian racial identity.  Wealth, however, was significantly but (apparently weakly) associated with European/White identity;  the authors report a  regression coefficient of  0.00291, p-value   6.1 x 10-4.


Hunt, E., & Sternberg, R. J. (2006). Sorry, wrong numbers: An analysis of a study of a correlation between skin color and IQ. Intelligence, 34(2), 131-137.

Meisenberg, G., & Woodley, M. A. (2013). Global behavioral variation: A test of differential- K. Personality and individual differences, 55(3), 273-278.

Ruiz-Linares, et al. (2014). Admixture in Latin America: geographic structure, phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS genetics, 10(9), e1004572.

8 thoughts on “Racial Ancestry in the Americas. Part 1: National Genomic Racial Admixture: Estimates and Validation

  1. Chuck,
    I checked the average 2012 PISA scores (combining math, reading, and science) for the following countries: Chile, Mexico, Uruguay, Costa Rica, Brazil, Argentina, Columbia, and Peru. If I compare the scores to the “European” proportion given toward the right-hand side of your Table 1, the Pearson correlation coefficient is +0.31. The correlation is positive, but not terribly impressive. The PISA scores of Uruguay, Brazil, and Argentina (the most European of the countries) are really disgraceful. Lynn has reported an average IQ for Uruguay of 96 (the single most European country), which would be respectable (if the score is reliable).

    If I take the National IQs from Lynn and Vanhanen (2012) for all of the above countries, the correlation with the countries’ European proportion is 0.67, which looks pretty impressive. As you know, IQ and achievement are not identical concepts; ethnic Europeans in Latin America are deficient in terms of achievement more so than IQ.

    • Hi Greg,
      Thanks for the comment. I’m almost done with my data set — built to test, as explanatory accounts, parasite load versus cold stress versus ancestry — If you’re interested, I’ll write up a description of the variables and send the file. I don’t like messy analyses where there’s no clear answer; this is one, so I’m losing my appetite for it. But maybe you would be interested? Generally, the correlations depend on which outcomes and countries one selects (and which method is used). Here’s a pic, limited to the same 8 countries, showing the pearson correlation between % European ancestry and GMAT, GRE (math), TOEFL, Lynn’s 2015 IQ (in progress), Altinok’s 2013 ACH, Hanushek and Woessmann’s 2012 Latin American PERCE and SERCE (table A1 column 8 here), the English proficiency test, age heaping (birth cohort 1880 to 1930), and Gerhard Meisenberg’s 2015 AQ (in progress) scores. Generally, there’s a non-trivial positive correlation. I think it’s a “true” one because it remains when I include more nations, aggregate scores, control for latitude, parasite load (disentangled from STD rates), and regional effects e.g., anglo versus latin or percent Spanish ancestry, weight by SQRT national population or not, etc. (See here for all nations (minus U.S. and Canada) for which I have estimates.) But, yes, Latin American scores are pretty dismal; the under-performance (relative to Europe, the US, and Canada) largely can’t be (statistically) explained by continental race (in the form of genes or transmitted culture). This is obvious when comparing the US to Brazil or Argentina in terms of outcomes and (continental) ancestry. Moreover, race is a rather poor predictor of HDI (human development index) and social progress differences, etc. What are other explanations, then? Latitude, net of parasite load, seems to be a robust predictor, at least for the broader sweep of nations. Parasite load/infectious disease rate actually isn’t a good one of cognitive ability (though it is of other more social/developmental variables) when one adjusts for volitional aspects in the form of HIV rate differences (which clearly are consequent to national IQs as IQ differences predate the spread of HIV throughout the Americas.) I don’t get why latitude or cold weather per se would explain differences, though, so latitude as such doesn’t strike me as a good causal account.

      • Chuck,
        I would be interested in your data file. The ancestry data in your Table 1 should be helpful in conjunction with Piffer’s work on intelligence-enhancing allele frequencies in various countries.
        –Greg Christainsen

  2. Interesting data, Chuck. Great work.

    The latitude thing, could that be due to the ones closest to Mexico have a lot of social problems due to the US-caused war on drugs? Perhaps you could add in the murder rate per capita and see if controlling for that changes things. I don’t know.

    • Former Mexican foreign secretary Jorge Castaneda notes that the impoverished Indian south of Mexico “continues to provide much of Mexico’s personality”. In contrast, the wealthier “north is industrious, modernizing, violent, lighter-skinned, and devoid of charm …” In short, the north sounds a lot like Los Angeles or Texas.

      My vague impression is that the north of Mexico, even before Mexico got big in the drug business about 25 years ago, tended to be pretty violent in a Wild Bunch kind of way.

      Durango in the northwest attracted Hollywood, especially John Wayne, because it was so much like the Old West of the cowboy shoot-em-ups, both topographically and culturally.


  3. Pingback: Racial Ancestry in the Americas. Part 2: Cognitive Variation between Nations: Parasite Load, Climate, and Ancestry | Human Varieties

  4. Altitude is an interesting variable.

    Paul Theroux’s travel book on Latin America, The Old Patagonia Express, quotes a lady telling him the nicest people in South America are at about 4,000 feet.

Leave a Reply

Your email address will not be published. Required fields are marked *