IQ Regression to the Mean : the Genetic Prediction Vindicated

The IQ differences between blacks and whites lead to differences in sibling regression to the mean. The races regress to different means. Criticisms were made ​​about the hereditarian interpretation of the differential sibling regressions. I will demonstrate that this phenomenon (1) is not a statistical artifact and (2) is consistent with the hereditarian interpretation of it.

Introduction. Although regression to the mean is sometimes interpreted as a strong support for the hereditarian hypothesis with regard to the nature of the black-white IQ difference (Jensen, 1973, pp. 110-119; 1998, pp. 468-472; Rushton & Jensen, 2005, p. 263), others suggest that this phenomenon fails to narrow the race-IQ debate.

The hereditarians argued that regression occurs because parents and children share 50% of their genes, this phenomenon is simply reflecting the non-transmission of heritable traits (that is, they are not shared). The degree of regression increases when the degree of kinship decreases. Environmentalists, however, believe that regression to the mean can also be understood in terms of differences in culture or environment. At the same time, the method of conception is not taken into account even if a person has taken a large dose of Cialis. Racial differences regarding sibling regression to the mean could be interpreted as a between-family difference, insofar as black and white siblings with equal IQs do not necessarily have the same home environment quality. After all, environmentalists may argue that black parents will provide a poor cognitive environment to their children, even if black and white parents were perfectly matched for IQ. But if the environmental theory of race differences is really tenable, we should expect a convergence in differential sibling regression to the mean. Any other result purely contradicts this theory.

Another kind of criticism (Kaplan, 2001, p. 16-18; Neuroskeptic, 2010) focuses on the interpretation of the regression to the mean per se. It was suggested that this phenomenon is just a statistical artifact. An example may help to understand the argument. Suppose in the next month, the number of car accidents in the country will suddenly double. The government responds by placing additional cameras, strengthening surveillance systems. This strategy will fail because in the next month, the number of accidents will go back to its initial level. Regression to the mean. In other words, the regression is thought to be a cyclical phenomenon of whatever luck and chance.

But that’s not clear at all. What kind of luck explains the fact that the children of high-IQ parents have lower IQs while they are reared in cognitive stimulating environments, when the children of low-IQ parents who were raised in chaotic environments still have higher IQs than their parents ? The IQs regress halfway (50%) to the population mean at both sides of the IQ distribution. If we stick to the Dickens-Flynn model (2001) of feedback loops, one would expect that children of high-IQ parents have higher IQ and children of low-IQ parents an IQ even lower. But the opposite happens. This criticism, in the end, does not provide any explanation for the fact that the regression is homogeneous across the different levels of IQ. As Jensen made it clear, the IQ subgroups do not depart from linearity for an IQ range going from 50 to 150.

While some say that regression to the mean occurs because of some kind of (random) measurement errors, it should be noted that IQ regression to the mean analyses are usually performed by using the method of estimated true scores, that is, IQ scores corrected for measurement error, or unreliability, with the formula :

Tˆ = rXX′ (X − MX) + MX

where Tˆ is the estimated true score, X the observed score, rXX′ the reliability coefficient of the test, and MX being the mean of the group. Why this method reduces the “luck” factor has been explained in Bias in Mental Testing (1980, pp. 276-277) by Jensen himself :

The net effect of using such estimated true scores, besides increasing the accuracy of measurement, is to reduce the higher scores of persons belonging to low-scoring subgroups and boost the lower scores of persons belonging to high-scoring subgroups. Such an outcome may seem unfair from the standpoint of members of the lower-scoring subgroups, but it is merely the statistically inevitable effect of increasing the accuracy of measurement. When higher scores are preferred in the selection procedure, the “luck” factor resulting from unreliability statistically favors persons belonging to lower-scoring groups. The “luck” factor is minimized by using estimated true scores instead of obtained scores.

[…] If test reliability is quite high (i.e., above .90), however, the slight gains in accuracy and predictive validity from using estimated true scores may hardly repay the extra computational effort.

But given that the reliability of AFQT is about 0.95 (Winship & Korenman, 1999), this method will leave the results unaffected in any case.

Still another critique, from Mackenzie (1984, p. 1220) this time, made the case that blacks and whites will regress to the same mean if the parent-child correlations or sibling correlations were calculated from pooled samples of blacks and whites. Of course, this tells us nothing about the causes of the racial differences in sibling regressions. Because, on the contrary, when the levels of IQ increase, the racial differences in sibling regressions will tend to converge, according to the environmental hypothesis. If this is not the case, the environmental interpretation is untenable. This was exactly what Jensen (1973) wanted to know : if one of the IQ subgroups at both ends shows some deviations from linearity. Or stated differently, to see if the regression lines converge at higher levels of IQ.

Educability and Group Differences (Jensen 1973, p. 241 fn. 4)

But the fact that the black-white IQ difference increases with SES levels (Jensen, 1973, pp. 241-242; Herrnstein & Murray, 1994, pp. 287-288; Jensen, 1998, p. 358; Gottfredson, 2003, Table 2; Hu, Oct.20.2013, Jan.18.2013) is hardly explainable from the environmental standpoint. Thus, Jensen (1973, p. 119) believed that it could be easily explained by the BW difference with regard to sibling regression toward the mean.

Method and Data. NLSY79 and NLSY97 were used for the present analysis, because sibling data and IQ subtests were available. Factor analysis can be performed for extracting g (analysis #1) and Jensen’s method of correlated vectors can be used for testing the association between sibling correlations on ASVAB subtests and black-white gaps as well as g-loadings in those subtests (analysis #2).

If one wants to replicate the present finding using my syntax and variables for NLSY79 (here) and NLSY97 (here), recall that SPSS is needed. Creating a free NLS Investigator account is needed if we wish to collect the relevant variables. Then, do a quick search by terms, keywords, as shown below :

Then, download your collection of selected variables, and copy/paste the files into a new file. Before running the syntax page, recall that the handle file should look like this.

Regarding the differential sibling regression to the mean, the purpose was to replicate and extend further Murray’s analysis on the NLSY79. I recoded the key variable as follows : BHW=1 for blacks, BHW=2 for hispanics, BHW=3 for whites, SIBLING=1 for full siblings, SIBLING=2 for unrelated and half siblings. Thanks to the CASESTOVARS command, it was possible to identify the NLSY full siblings. This command breaks a variable into a certain number of categories (depending on the number of values of this variable). So, when a variable ended with an .1 or an .2, this was the numero of the identified sibling : .1 for sibling #1 and .2 for sibling #2. The numbers after the sign “=” designate the categories of my dummy variables.

But because I was unable to find a magical SPSS syntax, I have to delete the missing values manually. The easiest way to do this hopefully is to simply use the “Sort Ascending” option in the SPSS data editor page for the relevant column. This will list the empty cells first. So I use this option for deleting missing values among siblings #1 and then among siblings #2. (“Copy Dataset” is a very useful function that duplicates the data window if some cases have been deleted by error)

Of course, some anomalies have been detected. For example, when one sibling self-identified as black or hispanic and the other sibling self-identified as white, and both responded that they are full siblings. These cases are deleted. Similarly, even when both agreed about their racial identification, sometimes the first sibling said the other one is a full sibling while this second sibling said the other one is not a full sibling. These cases, too, are deleted. Here’s an example of anomaly :

Also, there should be no missing values in either SIBLING.1 or SIBLING.2. Missing values in BHW (my race variable) is of no concern when both siblings said they are not full siblings. But if they were full siblings, empty values of BHW pose a problem because of the way I coded BHW, empty values are the respondents who are not either blacks, hispanics or whites. Those cases are deleted. For doing this, use Sort Ascending option on SIBLING.1 and SIBLING.2 columns. Values of 1 are listed first. Then re-use this option on the BHW column. This will put at the top of the list the full siblings who have empty values in BHW (in other words, full siblings who are not either blacks, whites or hispanics).

Because the data points are scattered everywhere when performing an overlay scatter plot (with option ‘Exclude cases variable by variable’) in order to display the regression lines for each racial group, I also display a graph with IQ subgroups, as Murray (1999) did :

To do this with the appropriate SPSS syntax (here), I categorized the IQs of siblings #1 for each race by averaging the IQs of all same-race subjects that have an IQ between -3 SD and -2 SD below the mean of the full sample analyzed, and IQ between -2 SD and -1 SD below the mean, and so forth. Filters and comparisons of means were used for this purpose.

Because WordPress doesn’t allow SPSS file, you have to send me a mail, if you want it. Hopefully, WordPress allows Excel file to be uploaded. If you don’t have Excel however, Kingsoft Spreadsheets is a good alternative.

NLSY79 g factor MCV regression to the mean and sibling correlations
NLSY97 g factor MCV regression to the mean and sibling correlations

I also assembled the data from half and unrelated siblings but I haven’t reported the result here because I found it uninformative (range of restriction of cognitive abilities, small sample size, …).

Results. The first analysis compares the BW sibling regression lines in the g dimension and non-g dimension of cognitive tests. The second analysis aims to replicate Jensen’s findings using his method of correlated vectors.

Analysis 1. The (PAF) factor analysis of the NLSY (97 and 79) ASVAB subtests allows the extraction of a g-factor score and a non-g factor score, represented by the loadings in the first factor and the second factor in the factor matrix. The interest is to see whether or not the degree of regression toward the mean is changing accordingly from the g dimension to the non-g dimension of cognitive tests.

Here, I display a graph showing the sibling regression without grouping IQs and another graph with IQ subgroups. The advantage of the latter, as stated above, is to have a better look at any deviation from linearity, as Murray did. Here’s what the NLSY97 sibling regressions look like :

Differential sibling regression lines in g (NLSY97)

Differential sibling regression lines in g, by IQ groups (NLSY97)

The x axis (horizontal, from left to right) shows the IQs of sibling #1. The y (vertical) axis shows the IQs of sibling #2. As we can see, there is no convergence in the regression lines at higher levels of IQ. The BW sibling gap may appear even larger. The BW sibling difference is about 0.50 SD.

Differential sibling regression lines in the g factor (NLSY79)

Differential sibling regression lines in g, by IQ subgroups (NLSY79)

Above are the graphs showing the sibling regression lines for the NLSY79. Here again, we see no convergence in the g-factor dimension. The hispanic line falls once again between the black and white lines.

Consistent with Murray (1999) and Jensen (1973), none of the above data points representing the IQ subgroups show any deviation from linearity. Now, let’s look at the non-g factor dimension, first for the NLSY97 and then for the NLSY79 :

Differential sibling regression lines in non g, by IQ subgroups (NLSY97)

Differential sibling regression lines in non g, by IQ subgroups (NLSY79)

Regarding the R² values for IQ subgroups, we shouldn’t put much faith on them. It’s obvious that they are totally uninformative here. What is of significance here is that the racial sibling gap is trivial. The IQs of siblings #2 move just slightly (-0.5 SD to +0.5 SD) as the IQs of sibling #1 are changing (-2 SD to +2 SD).

If the degree of regression is a function of the g-loadedness of IQ tests, with more regression among the less heritable component of IQ tests, it is hard to believe that this phenomenon is a mere statistical artifact. Next analysis provides another test of this assumption.

Analysis 2. Now we test Jensen’s predictions. In The g Factor (pp. 471-472), he wrote :

A number of different mental tests besides IQ were also given to the pupils in the school district described above. They included sixteen age-normed measures of scholastic achievement in language and arithmetic skills, short-term memory, and a speeded paper-and-pencil psychomotor test that mainly reflects effort or motivation in the testing situation. [50] Sibling intraclass correlations were obtained on each of the sixteen tests. IQ, being the most g loaded of all the tests, had the largest sibling correlation. All sixteen of the sibling correlations, however, fell below +.50 to varying degrees; the correlations ranged from .10 to .45., averaging .30 for whites and .28 for blacks. (For comparison, the average age-adjusted sibling correlations for height and weight in this sample were .44 and .38, respectively.) Deviations of these sibling correlations from the genetic correlation of .50 are an indication that the test score variances do reflect nongenetic factors to varying degrees. Conversely, the closer the obtained sibling correlation approaches the expected genetic correlation of .50, the larger its genetic component. These data, therefore, allow two predictions, which, if borne out, would be consistent with the default hypothesis:

1. The varying magnitudes of the sibling correlations on the sixteen diverse tests in blacks and whites should be positively correlated. In fact, the correlation between the vector of sixteen black sibling correlations and the corresponding vector of sixteen white sibling correlations was r = +.71, p = .002.

2. For both blacks and whites, there should be a positive correlation between (a) the magnitudes of the sibling correlations on the sixteen tests and (b) the magnitudes of the standardized mean W-B differences (average difference = 1.03σ) on the sixteen tests. The results show that the correlation between the standardized mean W-B differences on the sixteen tests and the siblings correlations is r = +.61, p < .013 for blacks, and r = +.80, p < .001 for whites.

Note that with regard to the second prediction, a purely environmental hypothesis of the mean W-B differences would predict a negative correlation between the magnitudes of the sibling correlations and the magnitudes of the mean W-B differences. The results in fact showing a strong positive correlation contradict this purely nongenetic hypothesis.

To recall, the default hypothesis (Jensen, 1998, p. 448) posits that the genetic and the environmental factors that cause the between-groups difference exist within each group (but not necessarily in equal degrees).

First of all, let’s see what the relationship between the vector of sibling correlations and the vector of g-loadings looks like. In the NLSY97, the BW g-loadings correlate strongly with white sibling correlations (+0.80) and black sibling correlations (+0.90). The HW g-loadings also displayed a strong relationship with both white and hispanic sibling correlations (+0.80). And again, the BH g-loadings also show a strong positive correlation with hispanic and black sibling correlations (respectively, +0.90 and +0.80). In the NLSY79, BW g-loadings correlate with sibling correlations for whites at about +0.80 and for blacks around +0.35 and +0.15. The HW g-loadings correlate strongly with white sibling correlations (around +0.75) and with hispanic sibling correlations (around +0.75). What is unexpected is that BH g-loadings correlate negatively with sibling correlations for blacks (around -0.20) and for hispanics (about -0.30 and -0.50).

Another method (apparently suggested by Bartholomew, 2004) that might improve the reliability of estimates consists in grouping two by two the subtest g-loadings and/or sibling correlations, by order/rank of estimates. For example, if GS and AR subtests have the two highest loadings, we first average the g-loadings of GS and AR, and then average the sibling correlations of GS and AR, and we repeat the process for the two next highest loadings, and so forth. But we can also group by d gaps, by averaging the two highest d gaps, and repeating the process for the second two highest d gaps, and so on, and finally by averaging the corresponding g-loadings in the column vector. And as the above picture shows, the correlation between the magnitude of BW g-loadings and the black sibling correlations is a little bit higher (+0.43 and +0.28, if we use g grouping; or +0.55 and +0.24 if we use sib r’s grouping).

Generally speaking, this finding supports the view that the magnitude of sibling regressions toward the mean diminishes as the g-loadedness of the test increases, which is also consistent with Analysis #1.

But what about the (non-g) loadings of the second factor with sibling correlations ? In the NLSY97, these associations are usually negative and none of them showed a positive slope for all races. In the NLSY79, however, the white full sibling correlations were strongly and positively associated (r and rho) with non-g loadings for BW non g-loadings, but this relationship is much smaller for HW non g-loadings. Among blacks, this relationship is positive but much smaller and looks like a random dispersion of dots for BW non g-loadings, or is near zero for BH non g-loadings. Among hispanics, they were small negative or small positive.

Now, regarding Jensen’s first prediction, the NLSY97 shows a very strong positive correlation between the vector of white sibling correlations and the vector of black sibling correlations (around +0.80 and +0.90). Between hispanics and whites, the correlations were also very high (around +0.90). Between hispanics and blacks, the correlations were about +0.80 and +0.90. In the NLSY79 I found a moderate positive correlation between the vector of white sibling correlations and the vector of black sibling correlations (around +0.40). Between whites and hispanics, the correlations turned to be about +0.40 or +0.50. Between blacks and hispanics, the correlations were around +0.30.

Finally, with regard to Jensen’s second prediction, the NLSY97 shows that the magnitude of the BW d gap is not related with the magnitude of black sibling correlations (near zero) or modestly with the white sibling correlations (around +0.20 or +0.15). The correlation between the HW d gap and sibling correlations is not trivial for whites (around +0.25 and +0.40) and for hispanics (around +0.40 and +0.50). Curiously, the correlation between BH d gap and sibling correlations is small for hispanics (around +0.10 and +0.15) but negative for blacks (-0.10 or -0.20). In the NLSY79, the magnitude of BW d gap correlates with black sibling correlations at about +0.10 and with white sibling correlations at about +0.05. The magnitude of HW d gap is positively correlated with sibling correlations for whites (around +0.40) and for hispanics (around +0.80 and +0.90). The magnitude of BH d gap shows a non-trivial negative relationship with sibling correlations for blacks (around -0.15 and -0.30) and for hispanics (around -0.25 and -0.50).

Using again Bartholomew’s method, the correlation between the magnitude of BW d gap with white sibling correlations becomes a little bit higher (at about +0.49 for r’s, and +0.23 for rho) while for black sibling correlations, it remains very low (+0.10 and -0.03, respectively) in the NLSY97. Regarding this, it should be noted that MCV totally failed to show a correlation between BW d gap and BW g-loadings in the NLSY97 even if there was in fact such a Spearman effect.

This method of course will not generate correlations as high as what Jensen found (about +0.60 and +0.80). But because none of these relationships were negative with regard to the black-white IQ gap, we can say that the environmental hypothesis is clearly rejected. Overall, my 2nd analysis attempting to replicate Jensen’s finding is mixed. It is not a great success, but it is not a failure either. The finding is still consistent with the hereditarian hypothesis but perhaps less than what he might have suggested.

Limitations. As explained above, regressed true scores were not used in analysis #1, but given the high reliability coefficient of AFQT (0.95), it will probably not affect the above result. Also, regarding the graphs of grouped IQs for g factor scores, the dots at both ends of the IQ distribution comprise in fact a very small sample size, with sometimes 10 or 20 sibling pairs.

Jensen’s method of correlated vectors used in analysis #2 is not without critics (Dolan, 2000, p. 46; Dolan & Hamaker, 2001, pp. 16-19; Ashton & Lee, 2005, p. 438). Dolan is confident that MGCFA, rather than MCV, allows one to demonstrate that the g model fits better than the competing models, and at the same time, he says that Jensen’s procedure provides no goodness of fit testing, with no test of B-W difference in covariance. Among other things, a hierarchical factor analysis (see Colom, 2002, for SPSS syntax) was not used as a secondary check of the existence of a general factor, and this poses a problem since Jensen (1998, pp. 96-97) has made it clear that hierarchical factor analysis could easily overcome the problem of what he calls a psychometric sampling error (that is, a situation where the extracted g is in fact a distorted g resulting from a biased representativeness of the tests in the test battery), although Ashton and Lee argued that its use does not overcome the many problems associated with a biased selection of subtests. On the other hand, Rushton (2007, p. 11) also defended the MCV. He made the case that the failure of Jensen’s MCV can be due in fact to a biased criterion (i.e., dependent variable). In Bias in Mental Testing (1980, pp. 310, 383), Jensen indeed wrote the following :

A biased criterion is one that consistently overrates (or underrates) the criterial performance of the members of a particular subpopulation. A good example is sex bias in school grades: teachers generally give slightly higher grades to girls than to boys, even when the sexes are perfectly matched on objective measures of scholastic achievement.

When the criterion itself is questionable, we must look at the various construct validity criteria of test bias. If these show no significant amount of test bias, it is likely (although not formally proved) that the criterion, not the test, is biased. In a validity study, poor criterion measurement can make a good test look bad.

However, I don’t see why this point should apply to the present analysis. But perhaps Jensen’s MCV used in conjunction with meta-analyses along with further corrections for artifacts (sampling error, range restriction of g-loading vectors, perfect construct validity, …) could yield a very promising results (te Nijenhuis, 20072013; Joep Dragt, 2010).

Another significant difference between Jensen’s application of MCV and mine, is that when he uses MCV to test the Spearman Hypothesis, his histogram (in, The g Factor, p. 382) shows a normal frequency distribution of g-loadings (g) and standardized mean B-W differences (d) for 149 subtests from 12 different test batteries (N = 286,901). But, in both the NLSY97 and NLSY79, it is clear that those distributions do not display normality in the frequency. As Jensen pointed out, a test of a Spearman effect using MCV should require, ideally, “large g loadings on the subtests and maximum variation among the subtests’ g loadings; also, large mean group differences on the subtests and maximum variation among the group differences”. So, this could be one of the reasons why the results from the MCV in Analysis #2 may appear sometimes contradictory.

If to be correctly applied, Jensen’s MCV required a multitude of conditions, it appears that I haven’t met those conditions in any case. If true, my findings related to the 2nd analysis must be considered with a pinch of salt.

Discussion. If, for reasons mentioned above, the BW sibling regression gap cannot be fully interpreted in terms of environments, we may think of a combination of genetic and shared environmental differences. But what kind of environment, exactly ? Chuck (Dec.8.2012), on “More thoughts on differential regression to the mean studies”, argues for a shared environmental effect, and Murray (1999) for a non-shared. Jensen (1973) seems to argue against shared environmental effects. In Educability & Group Differences, pp. 118-119, Jensen expresses his thoughts :

It can be claimed that though the white and Negro children are matched for IQ 120, they actually have different environments, with the Negro child, on the average, having the less intellectually stimulating environment. Therefore, it could be argued he actually has a higher genetic potential for intelligence than the environmentally favored white child with the same IQ. But if this were the case, why should not the Negro child’s siblings also have somewhat superior genetic potential? They have the same parents, and their degree of genetic resemblance, indicated by the theoretical genetic correlation among siblings, is presumably the same for Negroes and whites.

What Jensen has in mind would possibly be the idea that the absence of a convergence in the regression lines is difficult to explain in terms of differences in shared environment. But this would be true, also, with regard to non-shared environment. One cannot even begin to explain why blacks should be more environmentally depressed relative to whites at higher levels of IQ.


  1. Chuck

    This is an incredible post. With regards to the results, what were the magnitudes of the g and t factor differences (between Black and White sibs) in both samples? Your g/t findings might undermine my conjectured shared environmental model. I’ll have to think about this though.

    • Chuck

      I mean: what were the average sib1+sib2 (Black-White) differences for both factors and also what were the average differential sibling regressions for both factors. So, for the NLSY 97, the g-factor difference was about 1 and the differential regression was about 0.5. Ok. What about the t-factor difference and t-factor differential regression?

  2. 猛虎

    The reason why the Discussion section is short, is that I don’t know what to make of your model. If differential sibling regressions were to be interpreted in terms of a combination of 1) shared E + genetic influence or 2) unshared E + genetic influence, we need first to explain the absence of convergence and worse, the widening BW sibling regression gap with IQ levels. I was unable to explain the widening of BW gap with IQ/SES in terms of environmental influences. I have nothing much to say, and that’s why the Discussion section was brief. But maybe you can succeed where I failed.

    As for the T factor, as you see, the BW differential sibling regression is trivial (the 2 last graphs in analysis #1), it amounts to almost nothing.

    • Chuck

      Your discussion was fine. I was simply asking you for more information so to evaluate my conjectured shared environmental explanation.

      By this explanation, the magnitude of the differential regression should be: BW differential regression = BW d – (BW d * R). Where R is the coefficient of regression (or the linear relation between sib1 and sib2 scores). Based on your graphs R= ~ 0.6 (for g) and ~0.3 (for t). This is about what I got earlier:

      Now, you didn’t give the BW d for g or t. This is what I was asking for. But Mr. D does for NLSY 97:
      Quote: “(Cohen’s d’s on the g scale were B-W 1.124, B-H 0.368, and H-W 0.759, while on the T scale they were B-W 0.561, B-H 0.261, and H-W 0.306.)”

      So we can plug the numbers in: BW differential regression (g) = 1.24 – (1.124 * 0.6) = 0.44; BW differential regression (t) = 0.561 – (0.561 * 0.3) = 0.39. So my prediction based on my simple shared environmental model — which was really Nathan Brody’s model — was that differential regression (g) ~ differential regression (t). This obviously doesn’t fit with the data.

      I am not sure why though.

      The question then is: Why is there no differential regression for the t factor but differential regression for the g factor. An unshared environmental explanation can not make sense of this. So if there is an environmental explanation it must be a shared one.

      Now, I agree 100% with this point: “What Jensen has in mind would possibly be the idea that the absence of a convergence in the regression lines is difficult to explain in terms of differences in shared environment.” But this is a separate issue.

      Back to the point about g and t, I don’t get the genetic explanation for the g,t difference in differential regressions. I noted elsewhere: “Now it’s a mathematical given that two groups drawn from a common population will show regression, but unless there is some differentiating factor with respect to the dimension measured, they will regress to a common mean. If two groups regress towards separate population means, a causal explanation is wanting.” Why are Blacks and Whites regressing to different means for g but not t despite both g and t differences being caused by somethings?

      Again, I have to think about this some more….

  3. 猛虎

    Concerning the nlsy79 BW g gap,

    Is this what you wanted ? If yes, it appears that in the NLSY79, the non-g sources do not contribute to the BW gap, while the contrary is true for the NLSY97, as you said. This might explain why in the NLSY79, the non-g sources are negatively correlated with the racial gaps when, at the same time, the non-g sources postively correlate with racial gaps in the NLSY97. Given my above findings on the differential sibling regressions, it seems that the NLSY79 fits better. Or maybe the non-g (or T) gaps in the NLSY97 is the anomaly. No idea.

    Based on the figures you proposed, the BW sibling regressions for the NLSY79 look like this :

    g = 1.45 – (1.45*0.6) = 1.45 – 0.87 = 0.58
    non-g = -0.13 – (-0.56*0.3) = -0.13 – -0.168 = -0.298

    This aside, the absence of a BW gap regarding sibling regressions in the non-g dimension should be seen as the evidence that what is causing the BW regression gap is the g sources of those cognitive (IQ) tests. But at the same time, the NLSY97 shows a positive BW gap in the non-g dimension. If we stick to Jensen’s discussion you are citing, and if we consider the near-zero BW gap in sibling regressions regarding the non-g dimension as a within-group influence, and the positive BW gap in the non-g dimension as a between-group influence, this is the only way I can make sense of the data, meaning, the absence of BW sibling gap and the significant BW gap in non-g sources.

    • Chuck

      Thanks for the link. That’s exactly what I was looking for. Again, this is an excellent analysis — but it has a significant flaw — and I point this out for constructive reasons. That flaw is that the non-g gaps are small.

      You said: “Based on the figures you proposed, the BW sibling regressions for the NLSY79 look like this :
      g = 1.45 – (1.45*0.6) = 1.45 – 0.87 = 0.58
      non-g = -0.13 – (-0.56*0.3) = -0.13 – -0.168 = -0.298”

      No, the non-g gap, for NLSY79, would be:
      non-g = -0.13 – (-0.13*0.3) = -0.13 – -0..043 = -0.17

      So you wouldn’t expect any regression — for NLSY79 — because there is practically no non-g difference to start with. For your point, I think what you want to do is to compare two gaps with two noticeably different g-loadings and look at the differential regression adjusting for the initial size of the gaps. So, for example, using NLSY97 you could compare the regression for the SI (g-loading= 0.55, d=1) with GS (g-loading = .85, d=0.8). See here:

      Generally, the problem is that while you show that the magnitude of the sibling regression varies with g-loading, it just so happens that the magnitude of the population difference does likewise. So the shared environmental explanation for differences in differential regression is just that there are initial differences in the size of the g-related gaps to start with.

      So, I conclude that your g/t findings are consistent with my conjectured shared environmental model.

  4. 猛虎

    Ok. Let’s see if you can make sense of this. In the non-genetic component of the ASVAB, you have, for the NLSY97, a very small differential sibling gap that tends to a reduction at higher IQ levels. What is causing this ? The best candidate is obviously shared E. On the other hand, you see an increase in BW sibling regression gap in the genetic component of IQ. What I am trying to say here, is this : assuming your shared E + G model, what kind of environments (shared ? unshared ?) may explain the pattern of the BW sibling regressions in the genetic component of the ASVAB tests ?

    This aside, I don’t understand what you say here : “So you wouldn’t expect any regression — for NLSY79 — because there is practically no non-g difference to start with.”, because the issue is about the degree of regression, not the magnitude of the BW differential regression. And about the calculation, it’s right. I don’t know why I have multiplied -.56 instead of -.13.

  5. Daniel

    First of all, why is it that the comparison is always black vs white? What about other groups like Hispanics, or Native Americas, Arabs and Indians? How about studies regarding Ashkenazis vs gentile whites? Or whites vs Asians?

    Also, how come regression to different means only seems to occur in the US with regards to American blacks but it doesn’t seem to happen with blacks in European countries or black immigrants into the US? You may need to think more carefully about your theory.

  6. Donald Richardson


    How do we know it doesn’t happen with other groups? The black/white IQ gap in the USA is the aspect of racial differences in intelligence that has been studied the most.

    I believe Richwine showed that 2nd and 3rd generation Hispanics in the USA aren’t doing as well as the 1st did, which is evidence of regression.

    It will be interesting to see what happens to 2nd and 3rd generation Nigerian-Americans, since Nigerian immigrants in to the US seem disproportionately intelligent and educated compared to Nigerian-Nigerians.

    One thing I’d like to know is if the graph of black/white IQ resembles the graph of qualities that are highly heritable (such as height) compared to things that are less heritable (such as weight).

  7. Frank J

    The title of this article is “IQ Regression to the Mean : the Genetic Prediction Vindicated”, and it begins “The IQ differences between blacks and whites lead to differences in sibling regression to the mean. The races regress to different means.” I’ve read/skimmed the article for about an hour, with my very limited technical knowledge, and I’m having trouble locating a plainly stated findings summary that supports the title and intro. Studies usually include such a summary. The closest seems to be…

    “This method of course will not generate correlations as high as what Jensen found (about +0.60 and +0.80). But because none of these relationships were negative with regard to the black-white IQ gap, we can say that the environmental hypothesis is clearly rejected. Overall, my 2nd analysis attempting to replicate Jensen’s finding is mixed…”

    I’m aware that ‘regression to the mean’ is a more complex phenomena than generally understood, but ‘regression to different means’ is a basic type of evidence that race differences in IQ are (partly) genetic. I’d like to use this article as a source. Could someone please give me a summary; explain why the study supports an hereditarian interpretation? Or link me to a summary of this material?

    Sorry for not being more knowledgeable. I appreciate the work that went into this.

    • Meng Hu

      No, maybe it’s my fault. Perhaps one can have the impression that the title is a little bit misleading. In fact, my opinion was that the genetic hypothesis was not strongly supported, but probably a (little bit) more favorable to the genetic hypothesis than not. But I would like to have more data, of course.

  8. John W.

    A Professor in a graduate statistics class I took stated that regression to the mean was a statistical artifact that occurs due to the greater reliability of measurement obtained for scores closer to the mean.

    Assuming for a moment this is true, the one thing I’ve never seen discussed is how this phenomenon would impact g loading for a norm group and its demographic sub-groups. Using IRT theory, “g” can be conceptualized as “ability” and its accurate measurement at any given level depends on the difficulty level of an item. A highly difficult item will therefore not reliably measure the performance of a low-performing subgroup, just as an “easy” item will not measure the performance of a high-performing subgroup. If this is the case, isn’t “g” a forced compromise that contains significant “noise” for both high-performing and low-performing subgroups due to unreliability?

    Wouldn’t it make more sense to establish a “reliable zone of measurement” as a first step, then report a (second stage of assessment) specific score that falls within this range? It seems to me that an IQ test is forcing inherent measurement reliability problems on us and that “g” is not reliable across the span of ability. Thoughts?

    John W.

  9. Bill

    I am not smart enough to understand this.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Human Varieties

Theme by Anders NorenUp ↑