I was planning to publish this article after my paper on the black-white vocabulary gap in the GSS is released, but I have changed my mind. So, here, I will explain how to use the so-called “Yhat” or predicted values of Y when doing regression (OLS, logistic and multilevel).
Following up with a previous analysis, I examined the cognitive variation across the whole of the Americas using a newly constructed data set. Files can be found here and here, with the latest versions provided on request. The analysis was restricted to sovereign nations, not e.g., departments such as Martinique or territories such as the Virgin Islands. Non-sovereign regions were excluded so to avoid an inter-nation x intra-national interaction and because international exam data was not available for these regions. The following 35 countries were included: Argentina, Antigua and Barbuda, Bahamas The, Belize, Bolivia, Brazil, Barbados, Chile, Colombia, Costa Rica, Cuba, Dominica, Dominican Republic, Ecuador, Guyana, Grenada, Honduras, Haiti, Jamaica, St. Kitts and Nevis, St. Lucia, Mexico, Nicaragua, Panama, Peru, Paraguay, El Salvador, Suriname, Trinidad and Tobago, Uruguay, United States, St. Vincent, Venezuela RB, and Canada. Eight regression analyses were run, using the following dependent variables:
- (Skinrefl) Skin reflectance.
- (AchQ) National Achievement Scores – this was an updated set provided by Gerhard Meisenberg during October of 2014.
- (NIQ) National IQ scores – these were based on Richard Lynn’s 2014 (work in progress) results and Jason Malloy’s 2013 to 2014 estimates, with adjustments.
- (AHQ) 1880 to 1930 birth cohort age heaping scores — this is a measure of education/numeracy.
- (logSciresearch) Log of scientific researchers from 2005 to 2012.
- (logGDP) Average of 1990, 2000, and 2010 log of World Bank per capita GDP.
- (Crimes) Violent Crime rates.
- (HDI2012) 2012 Human Development Index scores.
The following independents were included:
- (relativeEu) European Ancestry percent — the percent of European ancestry out of the percent of European + Amerindian + African ancestry. (For a discussion of this variable, refer here.)
- (notUSCanada) Not US or Canada — whether the region was not US or Canada.
- (logparasiteload) Log Parasite load — the log of the 2004 WHO parasite infections per 100,000 for each country.
- (logColddemand) Log Cold demand — the log of Van de Vliert’s (2013) cold stress scores.
- (PopUnder1million) Population under 1 million — whether the country’s population was under one million.
Simple correlation analysis demonstrated that ancestry, cold weather, and parasite load intercorrelated. This situation renders difficult the isolation of causal associations. To illustrate, skin reflectance was set as a dependent with Eu ancestry, cold weather, parasite load, population under 1 million, and not US and Canada as independents. The correlation between Eu ancestry and skin reflectance is clearly mostly genetic in origin. To the extent that the association between ancestry and skin reflectance is mediated by other variables, it is suggested that these variables co-vary with causal effects related to genes (and thus that controlling for them controls for ancestry related causal effects). Regression results are shown in Table 1, below. Generally, parasite load and cold weather seem to partially index ancestry effects. Parasite load is a particularly problematic “environmental factor” because it significantly correlates with STD and HIV rates (at 0.47). Yet the spread of HIV throughout the Americas, in the ’70s and ’80s, was subsequent to the origin of cognitive ability differences, which, in the form of national age heaping rates, were already present in the 1800s. Thus, STD and HIV rates and with them parasite load are, to some extent, consequent of cognitive ability differences.
Results will not be discussed in detail. The data file is made freely available; readers can run the analyses as desired. Generally, European ancestry was a robust predictor of lower rates of violent crime, scientific activity, and achievement scores, and achievement plus National IQ scores. (For national IQ alone, in the final model, none of the predictors were significant; this was because the NIQ sample had many missing values.) In contrast to cognitive ability and the other mentioned indexes, European ancestry was generally not significantly associated with GDP or Human developmental indexes. The results for National achievement scores are shown in Table 2, below; a regression plot is shown in figure 1.
Table 1. Regression Results for Skin reflectance
Table 2. Regression Results for ACHQ2014
Figure 1. National Achievement Scores by % European Ancestry for Sovereign American Nations
It has been noted that in the Americas racial identification and genomic racial ancestry frequently don’t well correspond. In Latin America, the association seems to be modest on the individual level. For example, Ruiz-Linares et al. (2014) found a correlation of 0.48 between self-identified European and Amerindian racial identity and genomic ancestry in a five country sample. In principle, the same could hold on the aggregate national level. And in some instances there’s a clear discordance. While the Argentinean and Brazilian national populations have roughly the same degree of pre-1500 European ancestry, Argentine has a White European national image while Brazil has a multiracial one. One might wonder, then, to what extent average racial self-identification concords with average racial admixture on the national level. This is an interesting question and others in a similar vein can be asked. For example: To what degree are differences in national racial identification related to such and such outcomes independent of genomic ancestry? Perhaps, for example, members of countries with a more European identity act in aggregate different than ones that have developed, net of genotype, a less European one — an acting White effect on the national level. Ruiz-Linares et al. (2014) found that, on the individual level, White identity was associated with wealth (but not educational attainment) net of European ancestry (see note 1). If such a pattern can exist on the level of the individual, it could so on the level of the nation. Here, the first matter will be explored. I first present several indexes of national ancestry for the Americans; these include: national genomic percents, aggregate self- identified race percents, Putterman’s ancestry percents, and national skin reflectance scores. For comparability, these values are expressed in terms of major racial categories e.g., White European, Black African, and Amerindian — plus an “Other” group. I then use correlation analysis to validate these estimates.
Previously, a literature review was conducted regarding continental racial admixture and educational attainment and/or socioeconomic status. Across the Americas, Amerindian and African (versus European) ancestry was found to be negatively correlated in admixed populations (e.g., Hispanic Americans) with income, educational attainment, occupational rank, and other cognitively correlated indexes of socioeconomic status. Multiple possible explanations were discussed. Some of these predict that the ancestry-outcome association will generalize spatially, such that admixture will be correlated with outcomes across regions and nations. This need not be the case and is not directly predicted by other accounts of the individual level admixture-outcome associations, such as phenotypic based discrimination ones, which work on the individual-level. The association between regional ancestral and cognitive related outcome variation in Mexico will first be explored, since for this county reliable regional admixture estimates, at least with regards to European and Amerindian ancestry, and outcome measures are available and also since there is a good deal of spatial variation in admixture (and outcomes). Subsequently, the analysis will be generalized to the whole of the Americas. The Figure 1 below depicts the Mexican spatial ancestral racial variation.
Method: Admixture estimates: Admixture estimates were taken from Salzano and Sans (2014) and Moreno-Estrada et al. (2014). For the two sources, Pearson correlation was 0.94 for European admixture, -0.60 for African Admixture, and 0.94 for Amerindian admixture. Regarding European and Amerindian admixture, the estimates exhibited a high reliability, thus justifying their combination. The African Admixture estimates were unreliable due to the noisiness of the measures in conjunction with the limited range and variance in admixture. Admixture estimates were averaged for each district. Missing district data was then estimated based on the measured admixture of adjacent regions. This produced four different admixture estimates: (a) Salzano and Sans (2014), (b) Moreno-Estrada et al. (2014), (c) the average of (a) and (b), and (d) estimates based on (c) taking into account regional proximity. Descriptives are presented in Table 1. Cognitive Ability estimates: 2003, 2006, 2009, and 2012 average math and reading PISA scores were computed for each district. Regional scores were highly correlated across years, thus justifying the use of cross year average scores; deviation scores relative to the Mexican national mean were computed and averaged across years. 2002 and 2005 average district level Raven’s Matrices scores were also computed. Human Development Index: 2010 and 2012 Human Development Index scores highly correlated across year. And average scores was computed. The excel data file is attached.
Results: Since % African estimates were unreliable, these were treated as noise and this noise was partialed out in the correlation analyses. Thus, for these analyses the total ancestry was the Amerindian + European ancestry. The correlations between district level Amerindian Ancestry (with and without estimates) and district level cognitive ability and human development are shown below. Percent Amerindian Admixture was a strong negative predictor of district level outcomes. Partailing out the noisy African Admixture didn’t have a substantive effect on the correlations. Correlations using estimated and measured only admixture were similar as were ones using averaged admixture estimates and those provided by Salzano and Sans (2014) and Moreno-Estrada et al. (2014) independently. Results are shown in Table 2. The regression plot showing district level admixture (without African ancestry partailed out) and district level cognitive scores is shown in Figure 2. District level cognitive ability substantially mediated the association between continental ancestry and HDI (with and without African ancestry controlled for). (Without African admixture controlled for: Pearson correlation (AmerAdmix x HDI) = -0.603; AmerAdmix x HDI) correlation with PISA scores partailed out: -0.271.)
Discussion: Regional Amerindian Admixture was a robust predictor of regional outcome differences. The Amerindian ancestry -outcome association found based on admixture mapping generalizes spatially, at least in Mexico. These findings are as would be predicted by an evolutionary genetic explanation. Variations of shared environmental — “cultural” — accounts are possible insofar as shared environment can be genealogically sticky.
Tables and Figures:
(10/18/2014 update: data from two additional studies — Martínez et al. (2007) and Ruiz-Linares et al. (2014) — have been added.)
Over the last decade, scores of large scale admixture-mapping studies have been conducted largely in an attempt to elucidate the origin of ethnic disparities in disease rates and medical outcomes. In the simplest type of such studies, researchers determine if there is a robust association between genotypically defined continental racial ancestry (typically: African, European, and Amerindian) and relevant outcomes in admixed populations. To control for potential confounding effects, measures of educational attainment and other indexes of SES are often included in the analyses. These variables are often treated as environmental indicators, which is odd, since within populations they are found to be under non-trivial genetic influence. For example, based on a recent international meta-analysis of biometric studies involving 51,545 kinship pairs, Branigan, et al. (2013) found that educational attainment had a kinship-based heritability of 0.40, meaning that genes explained 40% of inter-individual educational differences; based on a sample involving 7,959 individuals, Rietveld et al. (2013, table S12) found a GCTA-based heritability, one which takes into account only the effects of population-wide common genetic variants, of 0.22. These results were replicated by Marioni, et al. (2014, table 3), who found a kinship-based heritability of 0.40 and a GCTA-based one of 0.21. When genes explain some of the variance in a trait within groups, they plausibly explain an indefinite portion of the variance between groups. Curious it is, then, that these external outcomes are often assumed to represent environmental influences between groups.
Earlier, I have reviewed Braden’s (1994) book, Deafness, Deprivation, and IQ. Considerable amount of studies have been conducted since then. The focus is on the validity of measures of intelligence among the deaf population, such as reliability, predictive validity, measurement properties of the tests.
Jeffery P. Braden. (1994). Deafness, deprivation, and IQ. Springer.
The book is a compilation of studies on deaf people, which concludes that cultural deprivation due to deafness lowers verbal IQ but not nonverbal IQ. Braden sought to prove Arthur Jensen wrong about his conclusions on the genetic component in racial differences in IQ. At the end, his research culminated in a trauma well known to scientific history, namely, his perfectly good theory was ruined by his data. Being born deaf does not affect g. And genetic theories are the most powerful arguments to account for the pattern of the data.