Jensen effect on racial IQ differences and GPA controlling for SES in the NLSY79 and NLSY97

In The g Factor, Jensen (1998, pp. 384-385) states that because races differ in SES levels, the Spearman-Jensen effect (i.e., g-loading correlates) found in racial IQ differences (hispanics, denoted H; blacks, denoted B; whites, denoted W) could simply reflect this fact. One reason seems to be that SES correlates with g-loadings although he affirms that it was irrelevant to Spearman’s hypothesis (furthermore, this does not necessarily imply that IQ gain due to SES improvement is itself g-loaded; see Jensen 1997, or Metzen 2012). When testing this hypothesis anyway, it was shown that the WISC subtests’ correlation with SES is correlated with WISC g-loading in both the white and black samples. Also, when matching for SES, the BW difference still correlates strongly with g-loadings. Presently, I will try to replicate this result.

The never-ending syntax can be found here for NLSY79 and here for NLSY97. My (EXCEL) file contains all the never-ending list of calculations and results described and explained in the following paragraphs. I do not produce any screenshot here since there were too much numbers everywhere. First of all, it seems that the use of weights (sampling weight) may have an impact on the magnitude of the racial gaps and to a lesser extent the magnitude of correlations. So, I calculated all the gaps with and without weights, but I haven’t applied the weights for the subtest intercorrelations and factor analyses, since the NLSinfo does not (apparently) recommend this when doing correlational analyses but recommend it for tabulating the characteristics of a given population (e.g., means, totals, proportions). In any case, I am taking into account the effect of age, sex, and parental SES (i.e., parental income and years of education) on the ASVAB subtests. Therefore, I produced age-regressed (out) ASVAB subtests, age/gender-regressed (out) ASVAB subtests, age/SES-regressed (out) ASVAB subtests, and age/gender/SES-regressed (out) ASVAB subtests.

I will start with some anomalies. Concerning the Jensen effect in BW black-white differences, the correlation with g is about 0.30 in NLSY97 and 0.40 in NLSY79. For the latter, however, if we regress out simultaneously age and gender variables, the correlations were about or around 0.00 and 0.10. Undoubtedly, MGCFA and IRT techniques are needed to investigate the question of bias with regard to gender and/or race at the subtest/item level. Yet I have no explanation why there is such sex bias in the BW comparison only and especially why there is nothing like this in the NLSY97.

Corr g-loadings with BW-difference (Anomaly)

As a way of demonstration, I produce here the column numbers. Despite the almost perfect correlation of BW gap (0.9582), Black g (0.9969), White g (0.9931), and BW g (0.9947) vector with one another (i.e., “reliability”), we see that the g correlation with (d) gap is about 0.30 when not regressing out sex variable but while doing so would lead to a r(g*d) of only 0.05. More annoying is when we look at the individual numbers at each column. They were nearly all the same, with the only exception being Auto/Shop Information subtest for which the BW gap deviates by 0.2. This is exactly the same kind of problem we see earlier in the meta-analysis of Jensen effect in heritability and environmentality of cognitive (sub)tests. I will repeat here again. 10 subtests is way too low for a very interpretable MCV test. This is even more problematic in the face of the high reliability of g loadings and group (d) difference vectors, being respectively 0.86 and 0.78 (Jensen, 1998, p. 383). In this way, correcting for vector (un)reliability and deviation from perfect construct validity is a pure waste of time. Such correction has more effect when the observed correlation moves away from zero. Here’s the difference :


Also, the above picture shows a very narrow distribution of g-loadings (0.090). If we assume a SDg of 0.128 as the population value (te Nijenhuis, 2007, p. 288), we get 0.090/0.128=0.703, this finally yields 0.407/0.703=0.579. The correlation simply doubled when compared with the initial correlation of just 0.300. Obviously, the impact of these artifacts is very important and must always be taken into account when possible.

Anyway, the heritability vector’s reliability in contrast is certainly much lower, so that there is possibility to improve it (e.g., by using samples much larger than a few hundred). In the above picture, however, this is difficult because the vector correlations are very close to unity. The only thing to improve MCV is by using IQ batteries having much more than 10 subtests, which is extremely rare. In any case, we should be skeptical about any sex effect on the r(g*d). It was probably an anomaly.

When looking at the above numbers closely, however, we see that Auto/Shop Information subtest had one of the smallest g-loadings but also one of the most highest black-white difference. After removing it, the correlation jumps drastically, from about +0.100 to +0.500. This was true in the NLSY97 as well. This subtest (divided in two variables in NLSY97) alone was a strong moderator in the magnitude of r(d*g).

MGCFA test is needed to see whether or not this subtest is biased and therefore should be removed or not. As Dragt (2010) meta-analysis clearly shows, biased items/subtests can affect the magnitude of the correlations. Regardless, the ASVAB website does mention the following :

Myth: Some individual items on the ASVAB are biased against minorities.

The Truth: The ASVAB testing program routinely conducts statistical analyses of new test items to ensure that individual items are not biased against minorities. Items displaying evidence of bias are excluded from use on the ASVAB. In addition, sensitivity analyses are conducted on new ASVAB items to guard against including items that might be unintentionally viewed as biased against or insensitive toward a particular group. Experts who are trained to recognize item insensitivity review all new items and identify items with questionable content. Such items are either revised or excluded for use on the ASVAB.

Given this, we wouldn’t expect the ASVAB to be racially biased. Still, I provided the necessary data (EXCEL) for doing such MGCFA test (in Amos for instance) at the subtest level. Any evidence of intercept difference, or intercept bias, would mean that the actual racial gap cannot be entirely attributed to g, the other factor contributing to the difference being the differing levels of difficulty across groups (see Wicherts & Dolan, 2010, for illustration). In MGCFA models, this would result in substantial decrement in model fit for intercept (scalar) invariance model relative to factor loading (metric) invariance model. Both must hold for measurement equivalence to be established.

Now when looking at the non-g loadings (PAF2) correlation with group differences, we see it hard to interpret. In the NLSY97, the black, hispanic and white PAF2 shows very strong correlation with racial gaps. In the NLSY79 however, the white PAF2, as well as hispanic and black PAF2 (when generated), always shows very large negative correlation with race differences.

Anyway, the correlation between g and group differences was unaffected by SES in both NLSY79 and NLSY97 for BW gap. Interestingly, regarding the significant HW gap correlation with g-loadings, it vanishes and becoming even negative in NLSY97 while remaining positive and significant in the NLSY79 after SES removed. One curious finding is definitely the BH gap. In the NLSY79, without controlling for SES, black-hispanic IQ difference shows no relation with g. In fact, such correlations were negative. After removing the influence of SES on all the ASVAB variables, the g*d correlation becomes slightly positive or near zero. At least, when not using weights because when applying sampling weight, the initial substantial negative correlation between g-loadings and BH gap decreases significantly after removing SES although it remains negative. In the NLSY97, BH gap and g-loading correlation without controlling for SES was about 0.11 or 0.14, but increases to 0.21 or 0.24 with SES regressed out from ASVAB variables.

I also provide data of racial gaps for both ASVAB-1981 and ASVAB-1999 for G-scores and nonG-scores, with and without controlling for parental SES. One particular feature is the BH gap, or black-hispanic gap. In both datasets, the gap increases after SES partialled out. We note the same thing happening at the subtest level, where the BH gap widens for all subtests when SES effect is removed. With regard to the widening black-hispanic gap when controlling for parental SES, the likely reason for this outcome is that hispanic parental education averages 1 or 2 years less than blacks, and their family income was about the same. At the same time, while controlling for SES reduces very little the black-white difference, it reduces the hispanic-white difference drastically. This could be compared with Jensen’s (1973, pp. 306-311) earlier analysis in which he compares blacks, whites and mexicans on PPVT (a caricature of culture loaded or biased test) and Raven (measuring essentially relation eduction, the purest form of Spearman’s g) scores. When equating for Raven, the mexicans scored below the blacks and blacks below the whites on PPVT. At the same time, when equating for PPVT score, blacks scored below whites, and hispanics scored very lightly above whites on Raven. Jensen interpreted this finding as to say that the mexican-white IQ difference was entirely due to socio-economic and/or cultural factors while the black-white IQ difference was due to a mix of genetic and environmental differences. The fact that hispanics were more ‘culturally’ deprived than blacks while scoring higher in cognitive tests is exactly what I was able to find in both NLSY79 and NLSY97. This is all the more interesting since g*d correlations between blacks and whites were not affected by SES but when it comes to hispanics (against either blacks or whites), SES may make a difference.

Now we also see that the BW (d) difference in g-score was about 1.60 SD in NLSY79 and 1.20 SD in NLSY97, suggesting a substantial decline. But this decline, when studied using the method of correlated vectors, has nothing to do with subtest g-loadings. Indeed, all comparison across racial groups (BW gap, HW gap, BH gap) showed substantial negative correlations with g, especially for BW gap. Generally, BW and HW (subtest) changes had positive signs, meaning that the gap is closing while not being g-loaded. These negative correlations were even stronger when using Jensen’s estimates (1985, Table 5) of either white g-loadings or black g-loadings of ASVAB.

When meta-analyzing Jensen’s collection (1985, Table 5) of data (total N=40850, total Harmonic N=14643), the meta-analytic correlation using White-g was 0.829 (11 studies) and using Black-g it was 0.786 (10 studies) when applied the correction for sampling error, g-loading range restriction, g vector unreliability, BW difference vector unreliability, and deviation from perfect construct validity.

Finally, in the NLSY97, I found a GPA variable (overall, english, foreign languages, math, social science, life sciences). The NLS investigator gives us this short introduction :

Credit weighted overall GPA. This variable indicates grade point averages across all courses on a 4 point grading scale. For each course, the quality grade ( is weighted by Carnegie credits ( Quality grades were recoded as follows: 1 = 4.3, 2=4.0, 3=3.7, 4=3.3, 5=3.0, 6=2.7, 7=2.3, 8=2.0, 9=1.7, 10=1.3, 11=1.0, 12=0.7, 13=0.0, all other values recoded to missing.  Please see Appendix 11 of the Codebook Supplement for more information on the collection and coding of transcript data.

I correlated all of these variables with each ASVAB subtests, yielding a column vector ‘ASVAB subtest correlates with GPA’ to which I correlate with ASVAB subtest vector g-loadings for each racial group separately. These correlations were very high (especially for blacks) at about 0.70 and 0.80, with two exceptions. When I regress out SES from my ASVAB variables, the correlation between subtest g-loadings and subtest correlations with GPA decreases somewhat but remain generally at about 0.50 and 0.70, with one exception. Blacks consistently have the highest g-loading*GPA correlations especially when using Spearman rho. When calculating the racial d gap in GPA scores, I noticed that the group differences as expected were much lower than in the ASVAB1999 scores. The BW difference in family income and parent’s education was also about 0.5 SD, half the difference in ASVAB.

To summarize, SES does not act as a moderate in the correlations between the g factor and black-white differences, there is no certainty that the gap reduction in ASVAB across cohorts was g-loaded for any group comparison, and there is a strong correlation between g-loadings and GPA scores studied in each racial groups separately.

1 Comment

  1. bussorah

    Since the army test deliberately removes b/w differences at the design stage, is there any point in looking at b/w differences using that data?

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Human Varieties

Theme by Anders NorenUp ↑