Color Differences: Corrections and Further Analysis. Part 2

Some things never change

Nearly 100 years ago George Ferguson tested the racial genetic hypothesis of IQ differences and found the following remarkable results, as reported by Baker (1974):


Just a couple of days ago, the awesome Audacious Epigone pointed out that the GSS (2012) contains a color ratings scale. GSS (2012) gives us the following results:

Color-Wordsum, African Americans (inclusively defined: IDs as Native Non-Hispanic Black)


Color-Wordsum, African Americans (self identified mixed race individuals excluded: IDs as Native non-Hispanic Black and no other race)


Color-Education, African Americans (inclusively defined)


Color-Education, controlling for Wordsum, African Americans (inclusively defined)


Color-Income, African American (inclusively defined)


Color-Income controlling for Wordsum, African American (inclusively defined)


(To note: the color-income results are complex because there is a strong sex x color interaction. Darker colored Black women do better than lighter colored Black women after measures of Human capital are accounted for, but the reverse holds for darker colored Black men and lighter colored Black men. But as my colleague has noted: “If we accept the premise that color-based discrimination should affect all people of both sexes, at every level, colorism is not borne out….” “Colorism” can only be salvaged by transforming it into “income-color-sexism.” But of course, there is an HBD perspective on the “gendered-race” phenomena, too. More on that later.)

As for the Color-IQ association, such an association should not be surprising. The GSS shows that self-identified mixed race Black-White individuals have intermediate Wordsum scores:

RACECEN1(1); BORN(1) M=6.42, N=4700

“Black and White”
RACECEN1(2); RACECEN2(1); Born(1) M=5.86, N=27
RACECEN1(2); RACECEN3(1); BORN(1) M=5.32, N=6
RACECEN1(1); RACECEN2(2); BORN(1) M=6.02, N=22
RACECEN1(1); RACECEN3(2); BORN(1) M=5, N=2

RACECEN1(1); RACECEN2(0); BORN(1) M=5.15, B=875

On the account of this “bi-racism“, which I have now documented in dozens of recent samples (e.g., NAEP and more NAEP, TIMSS, PIRLS, PISA, HSLS2009, NLSY97, etc.) using many different methods of identification (e.g., participant’s racial ID, parental ID of participant’s race, biological parents’ racial IDs, parents’/ individual’s reported national ancestry ID, etc.), there will inevitably be “colorism”. And the product of intermarriage, historic and recent, does in fact seem to be the major source of this when it comes to IQ. For summaries, readers are referred to Shuey’s review of mixed race IQ studies (1914-1964), my review of more recent studies (1961-2004), and my own investigations based on recent surveys (1994-2012) — 100 years of data points to one undeniable conclusion: The offspring of one Black and one White parent have IQ and color scores (in childhood, adolescence, and adulthood) that fall intermediate to those of the offspring, respectively, of two Black parents and of two White parents.


Three decades ago, the late great Arthur Jensen (1980) noted:

The positive correlation between lightness of skin pigmentation and IQ in the American black population (studies reviewed by Jensen, 1973, pp. 222-224) may or may not be an intrinsic correlation; no one has yet determined the WF and BF correlations between IQ and skin color. If the WF correlation is zero, it would rule out the hypothesis which explains the observed correlation in the black population in terms of adverse effect on IQ of social prejudice against darker skin. (Jensen, 1980. Uses of Sibling Data in Educational and Psychological Research) [Emphasis added.].

In short, results from a Between/Within family study could rule out a discriminatory explanation of the IQ-Color correlation. More recently, he noted:

The pleiotropy hypothesis makes sense in terms of evolutionary genetics. But can we empirically reject this pleiotropy hypothesis? After all, the possibility of outright empirical rejection of a hypothesis is the Popperian criterion of scientific argumentation. I do think it is possible to meet this criterion. I propose that it can be done by determining whether the IQ skin color correlation is what I have elsewhere termed an intrinsic correlation, as contrasted with an extrinsic correlation (Jensen,1980;Jensen & Sinha,1993). The presence of an extrinsic correlation in the absence of an intrinsic correlation rules out pleiotropy. The methodology of making this distinction has been applied to the correlation of IQ with physical stature (extrinsic), of IQ with head size (intrinsic), and of IQ with myopia (intrinsic) (Jensen,1980; Jensen&Johnson,1994; Cohn,Cohn, & Jensen, 1988). An extrinsic correlation between variables X and Y is one in which the absolute value of r XY in the population and is not 0 within families (i.e., within full sibships). An intrinsic correlation is one for which the absolute value of r XY > 0 both in the population and within families. Although every pair of full siblings (including DZ twins) has exactly the same unique ancestral genealogy, the members of each pair differ in the particular selection of the parental genes they inherit at conception. An individual who inherits a pleiotropic gene manifests both of its phenotypic effects, such as lighter pigmentation and higher IQ, as would be hypothesized in the case of these two variables. All of the IQ skin color correlations reported in the literature are entirely population correlations, hence they are not informative regarding pleiotropy. But with a reasonably large sample of full sibling pairs it would be possible to rule out pleiotropy. It would be ruled out if no statistically significant with- in-families correlation were found between siblings’ IQs and the siblings’ values on a linear index of skin pigmentation as objectively measured by one of the standard procedures used in physical anthropology. Pleiotropy implies that within each sibling pair the individual having the higher IQ would also more frequently have the lighter skin color. If this is not found to be the case, the pleiotropic hypothesis would have to be rejected. But if, on the other hand, the IQ skin color correlation turns out to be pleiotropic, and if this result can be adequately replicated, it would constitute a key item of evidence for the co-evolution of IQ (or more specifically g) and skin color. Unless geneticists can find sufficient fault with this line of reasoning as to render the proposed study scientifically worthless or technically unfeasible, I would hope that such a study will soon be forthcoming. (Jensen, 2006. Comments on correlations of IQ with skin color and geographic–demographic variables). [Emphasis added].

In short, results from a Between/Within family study could also rule out a pleiotropic explanation of the IQ-Color correlation.

We decided to follow Jensen’s call and to decompose the IQ-color correlation with and between families. Originally, we had designed this analysis such to disentangle the effects of additive genetics, discrimination, shared environment, and pleiotropy on IQ. Discrimination hypotheses typically propose that color associated IQ differences are consequent to color associated outcome differences which themselves are consequent to color associated discrimination. We have shown that the discriminatory hypothesis is untenable for a number of reasons, the most basic being that the differences in adulthood can be traced back to adolescence. The most plausible explanations for the IQ-color correlations are: shared environment (a between family effect, which should be unrelated to sibship), pleiotropy (a within family effect, which should show up between full sibs), and additive genetics. With regards to additive genetics, Jason Malloy recently articulated our framework:

This post discusses the expected differences in IQ between differently colored black siblings in an additive genetic model of race differences, while the colorism posts are predicated on an expected lack of IQ differences between differently colored black siblings in an additive genetic model of race differences. Contradiction?

In an important post, Dalliard explained why a lack of correlation is expected under hereditarian theory:

The significance of this family study design is that hereditarian theory predicts that skin color will be substantially associated with IQ between families, but not within families. That is, if we have two African American siblings the darker one should be approximately as likely as the lighter one to be the smarter of the two. This is because skin color is controlled by a handful of genes, scattered across different chromosomes, and each (full biological) sibling is equally likely to inherit any variant. Skin color cannot be used as a proxy for ancestry when comparing siblings. (Malloy, 2013. Cryptic Admixture, Mixed-Race Siblings, & Social Outcomes.)

We took additive genetics to be a between family only effect between full siblings, and a between and within family effect between less genetically related siblings. The crucial assumption was that color differences were under the control of a few genes of large effect. This assumption has been recently challenged, as explained by genetics researcher and pundit, Razib Khan:

Overall the biggest result out of this paper is found in the abstract: “We identify four major loci…for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (~44%).” The implication, which they lay out, is that in this admixed population the genetic architecture is such as that within that 44% there may be smaller effect genes which diffused through the genome, and strongly correlated with differential ancestry (i.e., European ancestral segments have more “light” alleles, African segments the “dark” ones). This is not entirely unreasonable. If pigmentation loci are targets of selection (their results suggest that this is so) then one might see change on large effect loci first, and then graduate convergence to the adaptive peak via small effect loci. But, I also believe that the fact that the European source population is on the darker side also is having an effect. The allele frequency differences between Swedes and Yoruba, would be larger than Portuguese and Yoruba (though to be sure the Portuguese and Swedes would still be far closer). (Khan, 2013. Pigmentation: the simplest of complex traits not so simple?)

The implication of these new findings is that skin color can be used as a proxy for ancestry when comparing full siblings. Nonetheless, since mixed race siblings differ little in racial ancestry, the possible effect of ancestry differences between full siblings on outcome differences would have to be quite small even were there large ancestral differences between populations in the trait in question. (I have estimated that on the account of being Whiter, lighter Black sibs could at most be 0.05 SD phenotypically smarter. My reasoning: The correlation between ancestry and color between full siblings should not be greater than it is between random individuals; given the IQ-ancestry correlation of rho = 0.4 in the African American population, the mean difference in White ancestry between lighter versus darker full siblings would be 0.4 times the mean difference in color in standardized units; in the NLSY97, the latter value between sibs who differed in color is 1.7 standard deviations (from section F.7, one sample t-statistic descriptives); assuming that this measure of color is reasonably reliable, lighter sibs would be about 0.7 SD more White than darker sibs (1.7 x 0.4); the standard deviation of White admixture in the Black population, based on other studies, is about 10-15%, so lighter sibs would be about 9% more White at most (this seems to be too high to me); if the mean geneotypic g difference between African Americans who are ~20% White and Whites who are ~95% White is 1 — which is what Hereditarians typically argue — then a 9% increase in ancestry would be equal to at most an 0.12 SD increase in geneotypic g. The correlation between geneotypic g and phenotypic g as measured by AFQT is maybe 0.8 (i.e., heritability = 0.65). So the phenotypic differences would be at most 0.1 SD. But color as indexed by color charts probably are not particularly reliable indexes of ancestry. So we have to correct down the figure, maybe to 0.05 SD or so.)

But there are more problems….if the genetics of color is more complex than hitherto thought, then it’s not just genetic ancestry that can condition a within family, between full sibling, genetic correlation. This point was made in a discussion at the 2001 Novartis Foundation symposium: The nature of intelligence [Emphases added]:

Houle: I’m interested in the point you’ve made about within- versus between- family correlations because it seems to me that you are drawing an incorrect conclusion. Assortative mating involving pairs of traits, such as height or brain size, for example, even if they are not causally related to each other at all, will cause genetic associations between these traits through linkage disequilibrium. This effect will be stronger for loci that are closely linked to each other. This will cause within-family correlation. The conclusion I would draw when you have assortative mating and and a lack of within-family correlation, is that the assortative mating is actually not on the genetic component of the traits being considered, but on the environmental deviations from the breeding value.

Jensen: That’s possible, but I have been told by geneticists that the linkage disequilibrium would not account for within-family correlations beyond the first generation. This is something that washes out very quickly. In the general population, if you have a large sample and look for these correlations, very little of it would be caused by linkage. It would be more pleiotropic, meaning that one gene has two or more apparently unrelated effects.

Houle: It depends on the assumptions you make. If you assume very simple geneticsfor example, one gene in influencing each traitthey are very unlikely to be closely linked. This would, to a large extent, get rid of this effect, but not entirely. Since traits such as brain function and height are the product of many genes some loci are bound to be closely linked, so any association would decay slowly for these loci; it’s very unlikely that you would be able to wash that out completely. The thing about assortative mating is that it occurs every generation so those correlations are constantly being reinforced: they won’t be large, perhaps, but they won’t be zero either. So if you can confidently say there’s no within-family correlation, you’re actually making a strong statement about the genetic relationship of genes to those traits.

Jensen: That’s a good point.

In short, in presence of cross assortative mating for color and IQ — or assortative exogamy on IQ between populations of different colors — you will get a genetic association between color and IQ between full siblings in proportion to the genetic complexity of color. That said, in absence of pleiotropy, I would still predict a practically significantly lower color-IQ association between full siblings within families versus between families. The genetics of color is still relatively simple.

Now about our method: We have been employing a within, between family design. The basic set up is commonly utilized in behavioral genetics. For the within family component of the analysis, we are examining the correlations between the signed sibling differences in traits. For the between family component, we are examining the correlations between sibling averages in the trait. These two sets of correlations can be compared after they are corrected for reliability. For these analyses it is standard to use absolute difference scores and the absolute average scores.. The results based on absolute values are given by Pearson’s r. As a robustness check I have included ranked difference scores and ranked average scores. These results are given by Spearman’s rho.

An incomplete project

I will note from the start that this is an incomplete project. I was unable to resolve the following issues:

Weights: Two well published statisticians gave Meng Hu and I conflicting advice on whether or not to use weighted values. I have decided to included weighted and unweighted values in most instances. Parametric Assumptions: The results based on parametric and non-parametric analyses are importantly different. As such both were included. It’s not clear to me, though, which is a more accurate description of the “true” relationship. Correction for unreliability of measures. The (linear) associations were not corrected for unreliability because the reliability of the skin color index was not known. Because the results are not-corrected readers are advised to not fixate on p-values. Corrections will tend to increase the within family correlations more than the between family correlations — but such corrections only make sense in context to linear relations (i.e., r) and yet virtually no such relation exists within families. The basic formula is:

Between family
reliability (sib means) = (AFQT (reliability) + (color-IQ correlation, between))/(l + (color-IQ correlation, between))

Within Family
reliability (sib differences) = (AFQT (reliability) – (color-IQ correlation, within)/(l – color-IQ correlation, within)

Where (sib means) is the reliability of the sib average correlation and r(sib differences) is the reliability of the sibling difference correlation. And the corrected correlations are then:

corrected IQ-color, between = (color-IQ correlation, between)/ r(sib means)
corrected IQ-color, within = (color-IQ correlation, within)/ r(sib differences)

Take the following example. The reliability of AFQT is about 0.95, the uncorrected color-AFQT BF is 0.15, the uncorrected color WF is 0.05. The corrections then would be:

Between family
reliability(sib means) = (0.95 + 0.15)/(l + 0.15)
= 0.96

Within Family
reliability (sib differences) = (0.95 – 0.05)/(l – 0.05)
= 0.95

corrected IQ-color, between = 0.15/0.96 = 0.157
corrected IQ-color, within = 0.15/0.95 = 0.052

The reliability of the color index

It was noted above that the true reliability of the color index is not known. Nonetheless, one can get a sense of the reliability by examining the MZ twin correspondence, since the heritability of true skin color should be close to 1. Comparing identical twin 1 to identical twin 2 at time 3, for a trait with a heritability approaching 1, is similar to comparing individual 1 at time 1 to individual 1 at time 2. In general, the twin correspondence was not particularly high. I was able to locate 20 identical twin pairs with color scores. Of these, 10 pairs had corresponding color scores. (Note: the AFQT scores are percentiles; if readers wish to compare the MZ twin AFQT correspondence to the MZ twin color correspondence using the same scales they can do so simply by dividing the AFQT scores by 10000 and then round up; this will give one AFQT and Color scores on a 10 point scale. The results are below:


Excel here.


Section A. The “raw” correlations between color and outcomes for African Americans:
A.A, Color-PIAT correlations; A.B, Color-PIAT correlations if AFQT scores are missing; A.C, Color-AFQT correlations; A.D, Color-g correlation; A.E, Color-t correlations; A.G, Color-HGE correlations; A.H, n-weighted average AFQT correlations and PIAT correlations, when AFQT is missing; A.I, n-weighted average PIAT correlations and AFQT correlations, when PIAT is missing.

Section B. Color-AFQT and Color-HGE when partailing out the effect of age and sex.

The color-AFQT and color-HGE correlations are presented with age and sex difference/averages partailed out. Age and sex had little effect on AFQT but these variables had some effect on HGE. The directions of the correlations remained the same. The raw correlations are presented below. To increase sample size, I averaged the n-weighted color-PIAT difference/average and the n-weighted color-AFQT difference/average correlations when one or the other value was missing. I placed my preferred (weighted) (linear) associations in boxes.


Section C. Scatter plots between color and AFQT.

Scatter and best fitting plots are shown for the color-AFQT relation.



Section D. Cubic versus Linear fit for rank differences within families.

This was a demonstration that a non-linear model was a better fit for the differences within families based on rank values. The relevance of this is that Spearman’s rho assumes monotonicity, which seems semi questionable.


Section E. Correlations between ranked and unranked color and AFQT within families.

I tried the identify which variable was driving the parametric/non-parametric difference in correlations within families. (For this analysis I used all possible Black sibling pairs.) I correlated: AFQT differences, ranked AFQT differences, color differences, ranked color difference. Both variables contributed to the difference. r, AFQT, rank-color = 0.03; r, rank-AFQT, color = 0.06; r, rank-AFQT, rank-color = 0.09.


Section F. 1. Descriptive statistics for AFQT, Color, and HGE showing the variance within families and between families.

Here, I further explored the associations and possible explanations for the with/between family difference and the parametric/non-parametric difference. In 1, I compared the variance within and between families in the traits in question. The variance was greater between full siblings within families (e.g., color: SD = 1.84, mean absolute difference = 1.35) than between families (e.g., color: SD= 1.62, mean absolute difference = 1.29), so restriction of range within families is not a plausible explanation for the lowered within family association.


2. More model fit exploration, AFQT-color.

I look at more models within and between families for AFQT and color. Generally, I was unable to identify a noticeably better fitting model for the within family association. A linear non-association seems to be a fair description of the within family color-AFQT relation.


3. AFQT-color correlations based on different sampling approaches: First minus second full sib; including all full sib pairs when there were multiple pairs within families; averaging full sib pairs when there were multiple pairs within families.

In 26 Black families there were multiple pairs of full siblings. In all of the previous analyses I had selected the first pair. To investigate further, I ran the correlations after including all possible full sibling pairs (so some families had multiple scores). I then averaged the multiple sibling pair differences and averages to create average family difference and average values and I then ran the correlations again. The correlations derived from these different methods are reported below. I did not apply weights here. Compare with the correlations above in section A.


4. Robust General Linear Regression.

I ran robust regression for AFQT and color using the different values discussed in (3). The program used can be located at: A concise discussion of the method can be found in Erceg-Hurn and Mirosevich (2008).

The results are shown below. Assumption violations with respect to the dependent variable (AFQT) are not causing the non association within families.



5. AFQT-Color and HGE-Color controlling for sex and age results.

These results were reported above.

6. Bootstrapping correlation results.

I looked at the confidence intervals of the correlations.


7. Alternative Analyses with Dichotomous color coding.

The method here was discussed previously. Quote:

Returning to the question of whether color is association with cognitive ability within families, we conducted a number of analyses based on dichotomously coded color. The concern was that the color scale was only a semi-interval scale. It’s possible that the treatment of color differences as interval scaled differences masks a “true” color-ability association.
Within populations: We looked at lighter siblings versus darker siblings. If the first sibling was lighter we coded them as being so (first, lighter sib =1, all else =0). If the first sibling was darker we coded them as being so (first darker sib, all else =0). We then entered these two dummy variables into linear regressions. Next we created a dichotomous variable (first = darker sib =1, first = lighter sib =0) and then computed the point-biserial correlation, r(pb). The purpose here was to remove the attenuating effect of sibling pairs for which there were no color differences. Finally, we added the scores of the first siblings = lighter to those of the first siblings = darker multiplied by (-1) and conducted a simple t-test. The logic here is that if there is no significant association between color and cognitive ability then the mean AFQT difference scores, which were computed by subtracting the scores of sib1 from those of sib2, should not be significantly different from zero when dealing separately with first sibs who were lighter or first sibs who were darker, or since we are working with a dichotomous pair, the first sibs who were lighter + first sibs who were darker*-1. The results of this t-test, of course, are the same as those of the point biserial correlation since the statistic is the same. This is just another way of presenting the results. This way, the mean score differences can be seen. Between populations: We repeated the above analyses comparing the average scores of the lightest half of the sib pairs to the average scores of the darkest half of the sib pairs. To split the population we used median color scores (since the mean color scores were skewed.

Generally, the association within families was weaker than that between families. None of the within family associations were significant at traditional levels of significance. In practical terms, within families lighter full siblings were, on average, 0.1 SD more intelligent than their darker bothers and sisters. Between families, lighter pairs of full siblings were, on average, 0.3 SD more intelligent than darker pairs.



8. Test for interval scaling for color by correlating the sibling difference scores with the sibling average scores.

I conducted a within/between family test for interval scaling for color using the method discussed by Jensen (1980). The results are consistent with the claim that the color scale is an interval scale.



The results for African Americans are very similar to those I reported for the full sample. I noted:

1. Simple binomial association (lighter sib — dichotomously coded — is smarter —- dichotomously coded). Difference: 55% to 45%. Significant. 2. Point biserial correlation (lighter sib — – dichotomously coded — is smarter). Difference: r(pb) = 0.07. Non-significant, but trending. 3. Spearman’s correlation (rank lightness associated with rank smartness). Difference: rho(unweighted) = 0.05. Not-significant, but close. 4. Pearson’s correlation (lightness — interval scale — associated with smartness — interval scale). Difference: r(unweighted) = 0.02. Not-significant, not even close.

One would have to change the numbers around a little, but the main effect is the same. No detectable association is found when parametric analyses are conducted. If however, you strip down the analysis as simple as possible and simply look at the frequency at which the lighter full sibling is the smarter full sibling you see a statistically significant effect. Intermediate methods e.g., rank correlations, t-tests using dichotomously coded Darker/Lighter show intermediate and typically non-significant effects.

Previously, I continued:

This effect could be because the underlying statistical assumptions were violated and therefore more info equals more bias; alternatively. it could be that the “true” association within families between FS is almost undetectable

I no longer think the former. The absolute association between color and IQ between full siblings is virtually undetectable. But yet there is nonetheless some association — and this can be seen when one strips the analysis down to the most basic form. Both of these conclusions are robust. As Jensen noted: “Pleiotropy implies that within each sibling pair the individual having the higher IQ would also more frequently have the lighter skin color.” This is true. But it’s also true that r Color-AFQT is not even close to being significantly different from zero within families.

I continued:

Whatever the case, since the cognitive ability scores which we are looking at, measured in adolescence as they were, are antecedent to adult outcomes, an association within families, if substantiated, would not support “colorism”, which holds that cognitive ability difference are consequent to outcome difference, themselves, which are consequent to labor market discrimination. But then what could be the cause of such differences?…. Before indulging in more speculations, though, we had better first look at “colorism” within socially defined races (e.g., Blacks).

And here we are…left with some puzzles…

Excel File here.


Baker, J. R. (1974). Race. New York: Oxford Univ. Press.

Bock, Gregory; Goode, Jamie; Webb, Kate, eds. (2000). The Nature of Intelligence. Novartis Foundation Symposium 233. Chichester: Wiley. Pages 49-51.

Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: an easy way to maximize the accuracy and power of your research. American Psychologist, 63(7), 591.

Jensen, A. R. (2006). Comments on correlations of IQ with skin color and geographic–demographic variables. Intelligence, 34(2), 128-131.

Jensen, A. R. (1980). Uses of sibling data in educational and psychological research. American Educational Research Journal, 17(2), 153-170.

Malloy, J. (2013, March 29). Cryptic Admixture, Mixed-Race Siblings, & Social Outcomes. Retrieved from:

Razib, K. (2013, March 24). Pigmentation: the simplest of complex traits not so simple? Retrieved from:

Shuey, A. M. (1966). The testing of Negro intelligence. New York: Social Science Press.

Leave a Reply

Your email address will not be published. Required fields are marked *