A Brief Comment on Hu (2013, April, 18): The Meaning of Differential Regression

Number 4 in the social science’s top 10 list of “grand challenge questions that are both foundational and transformative” (Giles, 2010) is: “How do we reduce the ‘skill gap’ between black and white people in America?” Presumably, figuring out the cause of this psychometric intelligence differential would help when it comes to deciding how best to minimize it. If so, we can thank Meng Hu for his recent efforts focused on determining the cause. This includes his recent extensive exploration of differential regression.

At the finish of his lengthily post on differential regression, Hu (2013, April, 18) concluded:

Discussion. If, for reasons mentioned above, the BW sibling regression gap cannot be fully interpreted in terms of environments, we may think of a combination of genetic and shared environmental differences…One cannot even begin to explain why blacks should be more environmentally depressed relative to whites at higher levels of IQ.

I wish here to comment on this point and to offer my own deliberations with regards to the interpretation of the results found.

As Hu (2013, April, 18) has noted, the meaning of the differential regression results has been subject to continual debate. Mackenzie (1980), commenting on Jensen (1973), argued that the results were a statistical artifact. Brody (2002), commenting on Jensen (1998), granted their statistical reality but argued that they were consistent with “virtually any” environmental hypothesis. Kaplan (2001), citing Fynn (1980), seemed to concur with Brody (2002). Murray (1999) argued that they were either consistent with a genetic hypothesis or a non shared environmental hypotheses – and he considered the latter to be implausible. Jensen and Rushton (2010), criticizing Nisbett (2009), argued that that these results were consistent with a genetic hypothesis and not readily consistent with a culture only hypothesis. Pinker (2012, August 6) indicated that they provided support for a genetic hypothesis of group differences.

There seems to be much confusion here. Let us try to shed some light on the issue.

Regression to the Mean and Inheritance of Deviance

In context to behavior genetics, people not infrequently discuss regression to the mean. Regression to the mean simply results, when it does, from deviance not being completely inherited. The inherited portion of a trait deviation from the mean is the portion conditioned by additive genetics and shared environment. It is the portion of a trait deviance that biological families (e.g., full siblings or biological parents and biological children) share. Regression to the mean is simply the non-transmission of trait deviance. It occurs, for example, when very smart parents have only somewhat smart children — because intelligence is only partially environmentally and genetically inherited. Generally speaking, we can define:

Regression to the Mean (R) = 1 – Inheritance of Deviance (I),

where, (I) =~ shared environmental and additive genetic effect.

Mostly people discuss R/I within populations but one can just as well discuss this dual phenomena as it occurs between populations. If two populations differ in terms of the inheritance of a trait, either due to differences in shared environment or to differences in additive genetics, they will show “differential regression”.

Measure and Meaning of Differential Regression

Differential regression is frequently measured by matching individuals on a trait between populations and then comparing the traits of their full siblings. Typically, graphs of the sibling regression lines are presented. A between population deviation in sibling regression lines results from some set of factors causing a population deviance. This deviation, of course, in itself, does not tell one anything more than what one already knows: that there is a population mean difference in the trait being investigated. This is the point that environmentalists such as Flynn (1980) have tried to make. However, the point that hereditarians, such as Jensen (1973), seemed to have tried to make is that the slopes of the regression lines do tell one something. Unfortunately for this debate, hereditarians have not spelled out, precisely, what they mean. We will do that below:

Differential regression tells one something about the within population distribution of a between population trait difference. To illustrate, using Meng Hu’s NLSY 1979 White sibling scores and the scores of created pseudo White siblings, I modeled four different environmental effects:

(1) The first graph shows the sibling regression lines produced by a uniformly distributed non-shared environmental difference which induces 1.2 SD of between population difference. Here, the regression line for White siblings is shown in blue. In purple is shown the regression line for pseudo White siblings. Since the between population effect is non shared, 2.4 standardized units was randomly subtracted from one of the pseudo White siblings per pseudo White pair (2.4/2 =1.2). The scores of the first White and first pseudo White siblings were then matched and averaged per tenth of a standard deviation and the scores of the second siblings were compared. As can be seen, the result is a non linear regression for the pseudo White group.


Now these are odd results, so let me explain what’s happening. Imagine that you had a set of sibling pair scores, sibling1 and sibling2 and that you created a new set of scores, pseudosibling1 and pseudosibling2, by taking the old set and then randomly subtracting 30 points from one of the two siblings per pair. These 30 points would, on average, represent a 15 point difference per individual. And your second set of siblings would, on average, be depressed 15 points relative to your first set. Since only one sibling per pseudo pair would be affected, this would constitute an unshared family effect. An example of this is shown below:


Now imagine that you went ahead and matched sibling1 and pseudosibling1 on IQ. These scores would give you a sibling1-pseudosibling1 column. If you went ahead and plotted sibling2 and pseudosibling2 scores on sibling1-pseudosibling1 scores, you would get something similar to the graph above, non-linear curve and all. This is because you end up with a large number of low IQ pseudosibling1 scores (e.g., 100-30=70) that end up showing a large regression upwards. These are individuals that would have had IQs of e.g., 100 and that have siblings with IQs of around 100 but were depressed 30 points. You might think that these scores would average out with those in the case where the other sibling was depressed. That is, for every 100-30 =70 pseudosibling1 (pseudosibling2 IQ = 100) you should have a 70 IQ pseudosibling (pseudosibling2 IQ = 40) — but you don’t because we are dealing with a normal distribution. More 100 IQ individuals will be depressed than 70 IQ individuals because there are more of the former. Now, we might suppose that the depressive effect is somehow systematic such that it depresses the IQs of only our second sibling. Maybe every second sibling born is depressed 30 points. In that case, we get a sibling depression that looks as follows:


The regression lines are parallel but now there’s a two standard deviation difference — almost three times as large as the difference found. Moreover, the intraclass sibling correlations come out as negative. This is because these correlations take into account the magnitude of the absolute sibling differences. To put the above point into perspective, the figure immediately below shows the full sibling intraclass correlations for Blacks and White for the NLSY97 and NLSY79. And the figure immediately after this shows the intraclass correlations that would result were the second of the White NLSY79 siblings depressed various amounts. As can be seen, only a “systematic” unshared depressive effect of 0.6 per sib pair — and therefore 0.3 SD overall — is somewhat consistent with the actual correlations found.


In the more realistic situation were the 0.6 per sib pair difference is normally distributed (SD=.3), the intraclass correlation drops to 0.405. Generally speaking, the Black intraclass full sibling correlations places a limit on the amount of between population variance that could possibly be explained by unshared environmental effects.

(2) The second graph shows the sibling regression lines produced by a partially uniformly distributed shared environmental difference which induces 1.2 SD of between population difference, where 10% of the depressed population is unaffected. Here we see that the regression line for the pseudo Whites is linear and that it converges with that of the Whites as IQ increases. This is because the 10% percent of unaffected sib pairs have higher IQs, on average, being unaffected, and because unaffected individuals will show no differential regression.


(3) The third graph shows the sibling regression lines produced by a normally distributed shared environmental difference which induces 1.2 SD of between population difference, where the standard deviation of depressive effect is 0.6. In this case, ~2.2% of the pseudo White groups is completely unaffected. Here, again, we see that the regression line for the pseudo Whites is linear and that it converges with that of the Whites as IQ increases. This is, again, because the less affected sib pairs have higher IQs and so higher IQ individuals tend to be less affected and so show less differential regression.


(4) The fourth graph shows the sibling regression lines produced by a normally distributed shared environmental difference which induces 1.2 SD of between population difference, where the standard deviation of depressive effect is 0.3. In this case, ~0.15% of the pseudo White groups is completely unaffected. Here, again, we see that the regression line for the pseudo Whites is linear but we also see that it shows little convergence. This is because the variability in depressive effect is minimal.



We can now compare the above theoretical results to the actual results presented by Meng Hu, which look like:


Clearly (1) and non shared environmental effect models, in general, are untenable. The Black regression line is linear — and, moreover, the magnitude of the found difference in regression lines at its maximum is not in line with this type of model. Models (2) and (3), which represent shared environmental effects, are also untenable, since the results of Hu (2013, April, 18) and Murray (1999) show that the differential regression lines do not converge or even narrow with increasing IQ. On the other hand, the found results are somewhat consistent with model (4), that is, with a shared environmental model which proposes that the standard deviation of the depressive effect is less than 0.6. Other considerations, particularly concerning measurement invariance, imply that the SD of the depressive effect can not be zero or near zero — as this is equivalent to having an x-factor. If so, this would violate MI, but MI has repeatedly been found to hold in the case of the Black-White differential. As such, any tenable environmental hypothesis must be a mostly shared environmental hypothesis which proposes that the effect depressing the Black mean is narrowly distributed in the Black population i.e., 0 > SD < 0.6.

Now, to note, for comparison, the standard deviation of depressive effect due to shared environment within populations is about 0.45 SD (note 1). So this hypothesized narrow variability in the between group difference (i.e., the variance in how much Blacks would be adversely impacted, on the account of the hypothesized shared environmental difference, relative to Whites) is not overly inconsistent with the variability in the with group differences within the Black population (i.e., the variance in how much Blacks are adversely impacted, on the account of shared environment, relative to Blacks). Readers familiar with this debate will spot the problem, though: the amount of total shared environmental difference between populations that would be needed to account for the (1.2 SD) difference would be 2.7 SD (note 2, 3). This simply does not exist. There have been attempts to deconstruct the estimates of shared environmentality, but these are of no help, in this case, because if the variance due to shared environmentality is increased within population it will likely be so between populations leading to differential regression results that match with models (2-3). To put this point another way, you can shared environmentally explain the differential regression results by positing that there is not much variance in depressive effect and you can explain the dearth of this variance by pointing out that there is not much within population variance on the account shared environment, but then this leads, inevitably, to the question of why there are such large differences between populations in the first place given the impotency of shared environments. If you try to explain this by shattering heritability (or shared environmentality) estimates, you are led back to the problem of non-convergence of differential regression lines. The only apparent escape from this catch 22 is simply to not deal with the totality of the evidence — and this generally seems to be the strategy employed.


Meng Hu claimed:

One cannot even begin to explain why blacks should be more environmentally depressed relative to whites at higher levels of IQ

I agree with Meng Hu that this situation is curious. Nonetheless, it seems to me that the general regression to the mean results, while consistent with a additive genetic model of group differences, are also, at least when taken in isolation, consistent with some shared environmental models. Another possibility is that the found differential regression slopes could be due to some combination of shared environmental and unshared environmental effect. It might be worthwhile to explore these models. With regards to shared environmental models, no models by which an appreciable portion (e.g., more than 2.5%) of the Black population is unaffected are tenable. Likewise, by all tenable models, more than 85% of the Black population must be depressed by at least 0.6 SD (using a conservative estimate).

Are the remaining environmental models plausible? In my estimation, no — when taking into account the totality of the evidence. But to answer this query properly, we would have to look at specific models and explanations or classes of them and evaluate them in particular. In general, it is curious, though, that between the time of Jensen's early work and the NLSY 97 the regression lines have not begun to converge — as would have happened if a non-trivial portion of the Black population (e.g., 10%) managed to escape the cumulative effect depressing the Black mean. Eventually, if the gap is to close, subsections of the African-American population will need to escape the mysterious cognitive depressing effect, an event which will result in a convergence of the sibling regression lines at the right end of the spectrum (model 2, 3). Insofar as there is no convergence, again, the results are consistent with an additive genetic model.


(1) It would be: SQRT((SD^2)*C^2)), where SD is the variance in the trait and C^2 is the portion of variance due to shared environment. The C^2 found for IQ is typically around 0.2, age depending.

(2) The difference in g-scores is ~1.2 SD. The shared envrionmentality is about 0.2– and so the correlation between cumulative shared environmental effect and g-scores is about 0.4 (e.g., SQRT(0.2). The amount of cumulative shared environmental effect, then, needed to explain a ~1.2 SD difference would then be ~1.2/0.4 or 2.7 SD.

(3) Some have argued that there are 2.7 or so standard deviations of shared environmental effects between contemporaneous Black and White Americans on the account that there are 2.7 or so standard deviations in cumulative “environmental” differences (e.g., Fryer and Levitt). But those who have have failed to grasp the distinction, among other things, between environmental factors and external factors. The shared envrionmentality of external factors (e.g., parental income, number of book read to by parents, peer groups, home “environment”) is only about 0.5 (Vinkhuyzen et al., 2009; Plomin and Bergeman, 1991). As such, the total amount of difference in cognitive affecting external factors needed to account for the difference would have to be 2.7 SD/ SQRT(0.5). or 3.6 SD. There would have to be almost no overlap. Also, the relevant amount of cumulative external factor difference would not be the sum of the effects but the sum of the independent effects of external factors determined with multiple regressions. Interested readers can explore the possibility that there is such a difference using the publicly available NLSY79 Children and Young Adults survey.


Brody, N. (2002). Jensen’s Genetic Interpretation of Racial Differences in Intelligence: Critical Evaluation. In: Nyborg, Helmuth, ed. The Scientific Study of General Intelligence: Tribute to Arthur Jensen. Pergamon.

Flynn, J. (1980). Race, IQ and Jensen, London and Boston: Routledge & Kegan Paul.

Hu, M. (2013, April, 18). IQ Regression to the Mean : the Genetic Prediction Vindicated. Accessed at: https://humanvarieties.org/2013/04/18/iq-regression-to-the-mean-the-genetic-prediction-vindicated/

Jensen, A. (1973). Educability and Group Differences. Harper & Row.

Kaplan, J. (2001). Misuses of statistics in the study of intelligence: The case of Arthur Jensen. Chance, 14(4), 14-26.

Mackenzie, B. (1980). Fallacious use of regression effects in the iq controversy. Australian Psychologist, 15(3), 369-384.

Murray, C. (1999). The Secular Increase in IQ and Longitudinal Changes in the Magnitude of the Black-White Difference: Evidence from the NLSY. In Behavior Genetics Association Meeting.

Nisbett, R. E. (2009). Intelligence and how to get it: Why schools and cultures count. New York, NY: Norton

Pinker, S. (2012, August 6). Steve Pinker Responds to Ron Unz. Accessed at: http://www.amren.com/news/2012/08/steve-pinker-responds-to-ron-unz/

Rushton, J. P., & Jensen, A. R. (2010). Race and IQ: A theory-based review of the research in Richard Nisbett’s Intelligence and How to Get It. The Open Psychology Journal, 3(1), 9-35.


  1. td

    Interesting analysis. Clearly any hypothetical environmental effect must be close to uniformly distributed within the black population. However, I take exception to a few of the claims made here.

    I’m no expert on psychometrics, so correct me if I’m wrong, but I believe measurement invariance implies there is no group specific X-factor (e.g. stereotype threat) corrupting the factor structure of a given test. I don’t believe MI tells us anything about the underlying cause(s) of group differences, but rather proves the score differential reflects a difference in the latent variables the test is supposed to measure (i.e. the difference isn’t a psychometric artifact). Thus MI is entirely compatible with an environmental X-factor. (If all black children were hit in the head with a hammer every night, that would presumably be a g depressing X-factor, but IQ tests would still be MI).

    Also, the claim that the shared environmental proportion of IQ variance is about 0.2, “age depending”, is highly imprecise at best. It is a very well established result that the heritability of IQ drastically increases with age, at the sole expense of shared environmental effects (non-shared environment accounts for about 20% of variance at any age). Among the general population of an industrialized nation, h^2 goes from under 0.4 at age 7, to approximately 0.65 at age 16, to around 0.8 by full adulthood . So shared environment accounts for the greatest proportion of variance among young children, and basically nothing among mature adults.

    Statements about the heritability of IQ which are not qualified for age (explicitly or implicitly) are worthless. Unfortunately, I still see this mistake made constantly, even in the published literature.

    • johnfuerst

      I felt that Lubke et al. made a persuasive case. See: Lubke, et al. (2003). On the relationship between sources of within-and between-group differences and measurement invariance in the common factor model.

      But I could be persuaded otherwise. What did you have in mind?

      • td

        I went to download that paper, then realized I already had a copy I’ve been meaning to read. The authors discuss MI in relation to Lewontin’s famous example of the seeds planted in different soils. This is a rather odd use of factor analysis. I’ve never seen anything like it before. Factors typically represent latent variables which can only be measured indirectly via psychometric tests. g is a hypothetical latent variable measured by IQ tests. In their discussion of Lewontin’s plants, they treat physical causes as the factors underlying various measurements like plant height. Generally, that kind of analysis is carried out via multiple regression.

        The issue with your inference is that IQ test factor analysis always concerns latent variables like g. So when someone says MI holds for an IQ test between two groups, they’re simply saying that an individual’s score on the test is independent of group membership and depends only on the latent variables the test is supposed to measure (like g). That tells us nothing about the causes of any differences in the group factor means. So an environmental “X-factor” could uniformly depress g in the black population, and MI would still hold because the relationship between g, all other factors, and test scores would be the same for individuals in either group.

        MI of IQ tests between blacks and whites does imply the tests aren’t biased, which is a crucial result (note that the long established lack of predictive bias didn’t imply lack of measurement bias). It means the score differential can’t be explained away by stereotype threat or the like, which might be called psychometric X-factors. (It doesn’t imply that the difference is necessarily on g, though all the evidence I’ve seen points to that being the case.)

      • johnfuerst

        With regards to the point about the shared environmentality of IQ, I decided not to be more precise because the exact figures are hotly contested and, in my opinion, still somewhat murky. Here, for example, was Kaplan’s 2013 critical review of the matter. He — in addition to others — disputes the claim that by adulthood shared environment accounts for little of the variance in IQ. As I didn’t want my argument to hinge on an unestablished figure, one subject to revision, I adopted the well established across age estimate which also happens to be the upper bounds estimate of c^2 in adulthood.

        With regards to your comment about MI and x-factors….An x-factor is defined as a factor unique to one of two populations which causes a mean difference. You are correct that MI does not logically imply (i.e., necessitate) the absence of an x-factor acting on latent ability. MI nonetheless strongly suggests this because an x-factor — a group unique factor — would likely cause a difference between and within groups in the antecedent-latent factor- outcome covariance matrices, a difference which would violate MI — since MI requires that a between group difference can be reproduced from within group differences. For this not to occur, you would need a super x-factor — a group unique cause which leaves untouched the matrices of covariances with regards to both antecedents and to outcomes. Lubke et al. argues that this possibility is far fetched and I agree. You seem to think otherwise but are not clear as to why. You say: “So when someone says MI holds…[t]hat tells us nothing about the causes of any differences in the group factor means.” But the whole point of Lubke et al. was to articulate why MI does tell us something about the cause (of a group difference). So the question here is whether or not Lubke et al.’s basic inference is correct. The argument in context to g differences is, more or less:

        (1) MI logically implies that the relationship between the antecedents and the consequents of differences in g is the same within and between groups. If the relationship is not the same, MI will not hold.
        (2) If the cause of g differences between groups is fundamentally different from the cause of g differences within groups, then it is very likely that the relationship between the antecedents and the consequents of differences in g will not be the same within and between groups.

        I presume that you disagree with (2).

      • td

        I’ll check out that paper by Kaplan. I’m not interested in anything Nisbett has to say. He’s a dishonest charlatan (he claims the media was uncritical of The Bell Curve and H&M’s arguments contained therein; nothing else need be said). I wasn’t aware there was still debate on this topic. If you have any recent papers of note, especially a large meta analysis, I’d appreciate it if you linked them.

        I disagree that there is a “well established across age estimate”. What is well established is the rapidly decreasing importance of shared environment with age, at least into the late teenage years. If you’re analyzing data based on, say, 7 year old children, you simply can’t use c^2 = 0.2 in any calculations and get good results. See the following:

        http://www.ncbi.nlm.nih.gov/pubmed/19488046 (I can send you the PDF if necessary)

        Regarding MI, I believe we have a semantic misunderstanding. You seem to be conflating “factors” in the psychometric sense, which are mathematical constructs, with real world causal “factors” (e.g. SES in relation to health outcomes).

        “You are correct that MI does not logically imply (i.e., necessitate) the absence of an x-factor acting on latent ability. MI nonetheless strongly suggests this because an x-factor — a group unique factor — would likely cause a difference between and within groups in the antecedent-latent factor- outcome covariance matrices, a difference which would violate MI — since MI requires that a between group difference can be reproduced from within group differences.”

        What you said is true for psychometric factors, but I maintain that it has nothing to do with the causes of group differences in latent variables. A group unique psychometric factor corrupts the covariance matrices. A group unique environmental “factor” which depresses a group’s actual latent ability or abilities (resulting in an obtained factor score differential) does not, as far as I know. Nothing I read in Lubke made me think otherwise.

        “But the whole point of Lubke et al. was to articulate why MI does tell us something about the cause (of a group difference).”

        Not in the sense you mean. MI implies that the proximate causes (e.g. variation in g) of differences in psychometric test performance are the same within and between groups, not that the ultimate causes (e.g. genetic differences affecting g) are the same. At the end of the paper, they do suggest that the former has implications for the latter, and I agree. Jensen repeatedly emphasized that it matters whether the black-white difference is on g or not.

  2. johnfuerst

    Sorry, been busy. As for the c^2 issue, I meant, “the well established across age average”. The heritability of IQ is routinely said to be between 0.4 and 0.8, age depending. This puts the shared environmentality at between 0.4 and 0.0, age depending; the across age average, then, is 0.2. Perhaps my phrasing is confusing. I have discussed the relationship between age, within group heritability, and between group heritability elsewhere, for example in section A1 here. Again, I use 0.2 because it’s a safe lower bounds estimate of the c^2 in adulthood. The c^2 might be somewhat lower for Whites but higher for Blacks. See table 4 here, for example. If you could show that the c^2 for Black adults and White adults was 0 or identical you could effectively falsify a non-genetic hypothesis. The differences simply can’t largely be explained in terms of non shared environments (e.g., because of differential regression and biracial performance and the fact that the gap is substantially statistically explained by differences in Parental IQ, etc. ) — and the difference isn’t due to x-factors (e.g., because of MI, similarity of covariance matrices, etc.) yet the nature/nurture debate continues. From this, I infer that it’s widely believed that the adult c^2 is less than 0 and that the Black-White heritabilities are not identical. Where do I error in my reasoning? Look, if you want to edit what I wrote or expand on it — I would be happy to send a HV invite. That would be great. I just don’t care to do so myself.

    As for your comment about MI, you are incorrect. Specifically, your interpretation of Lubke et al. is incorrect. If you want, email one of the authors or ask a third party to read the paper. I am sure that either will agree with me. Reread section 4. Think about the purpose of their Lewontin example. Regardless, the question then becomes: Are the authors correct? I will have to think about this some more when I get a chance — and try to come up with a better way to explain the reasoning.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Human Varieties

Theme by Anders NorenUp ↑