Borghans et al. (2016) analyze 4 datasets with diverse measures of IQ and, shockingly, concluded that the impact of IQ on social outcomes is weak compared to personality measures, despite what the earlier reviews and meta-analyses showed (Gottfredson, 1997; Poropat, 2009; Schmidt & Hunter, 2004). Indeed, as reviewed prior, most studies found that personality measures generally have weak relationship with outcomes once IQ is accounted for. Yet their work has not been subjected to critical examination, just various uninteresting comments (Ganzach & Zisman, 2022; Golsteyn et al., 2022; Stankov, 2023) and replication failures (Zisman & Ganzach, 2022).

What went wrong is how IQ has been measured. By using bad measures. The first fallacy is that Borghans et al. considered fluid intelligence, especially the Raven, as a pure measure of IQ, rather than a latent factor of a broad battery of tests that assesses all cognitive domains. The second fallacy is that they couldn’t get the definition right because what they treated as achievement tests had content similar to IQ tests, and they were all better cognitive measures than the IQ tests used in their analysis. The third fallacy, committed by Golsteyn et al. (2022), is their belief that a test that correlates with personality is not a pure measure of intelligence, and therefore not a valid measure. They do not explain why such a correlation is problematic. Removing IQ-related components prior to regression analysis would have deleterious effects. If IQ correlates with education, then IQ might have an education component, but as we remove the education component of IQ we will later find out that IQ no longer predicts either education or wage or anything. We have created a bad IQ measure. This is what Borghans et al. did. If personality had a causal impact on IQ, this would provide reasons to adjust for personality, but such a causal relationship has yet to be proven. Longitudinal analyses could answer whether there is such a causal relationship and whether there is such a personality component in achievement tests, as Borghans et al. believe. Bardach et al. (2023) evaluated the longitudinal association between personality, intelligence and academic achievement by using random intercept cross-lagged models. They found no evidence of either main effect for each personality trait of the Big Five, or even an interaction with intelligence, on achievement. If personality does not have longitudinal (causal) influence on achievement, this rejects the proposition by Lechner et al. (2017) that achievement tests should be adjusted for personality just because these variables are correlated.

It is rather interesting that one of the co-author in Borghans et al. (2016) is none other than James Heckman. This is not his first bogus attempt to discredit the importance of IQ, as there were countless ones prior. Earlier, Dalliard commented on the study by Borghans et al. (2011) in which they analyzed the NLSY79 and Stella Maris datasets, based on the same fallacies and reaching the same conclusion, but with even more dubious statistical procedures (Salkever, 2015).

Now let’s examine the data, variables, and analyses of their newer study. I recommend checking not just their paper, but their appendix as well. Below, one immediately notices that the relationship between their IQ measures and various outcomes is much smaller than previous reports from hundreds of studies.

Stella Maris data

IQ is measured as the principal component of only 8 Raven’s items. It is more appropriate however to employ factor analysis (FA) rather than principal component (PCA). Cognitive items certainly are not measured without error, yet the residual variance of the items is treated as true variance in PCA but is treated as error variance in FA. Because the theoretical model is formative in PCA and reflective in FA, the composites are viewed as a function of the items in PCA, but the items are viewed as a function of the latent factors in FA. Fabrigar et al. (1999, p. 276) clarified that the goal of FA is to explain correlations among measured variables but the goal of PCA (which does not differentiate common and unique variance) is to simply account for variance in the measured variables, concluding that when the goal of the analysis is to identify latent constructs underlying measured variables, it is more sensible to use exploratory FA than PCA. Another issue is that IQ item variables are typically dichotomous, which requires the use of polychoric correlation when applying either PCA or FA, yet Borghans et al. (2016) never explained anywhere the procedure they use. And while Cronbach’s alpha is not an optimal measure of reliability, their alpha value (0.62) is quite low.

Achievement test is measured by the Differential Aptitude Test (DAT). The irony is that the DAT not only is considered by psychologists as a cognitive test rather than achievement test (te Nijenhuis et al., 2000) but it actually measures g better than Raven because it is a broader measure of intelligence; in this case fluid, crystallized and visual perception (with a speed residual, likely). The ideal procedure is to subject the entire battery to CFA, based on a bifactor structure. Borghans et al. (2016) used instead the observed total score.

As for the results, if one takes DAT as a legitimate IQ test, the IQ-grade correlation (0.316) is larger than the grade-personality correlation (0.257). Furthermore, the low correlation between Raven and DAT (0.378) is a good reason to distrust this IQ measure composed of only 8 Raven items. No wonder why this bad IQ measure has low correlation with grade (0.112). Their regression analyses treat Raven, grit and Big Five as predictors of achievement (DAT) and grades. They didn’t use DAT as predictor of grades, so the analysis is worthless.

British Cohort Study (BCS:70) data

IQ is measured at 10 years old, solely by the 28-item Matrices subtest of the British Ability Scales (BAS), which is a battery that contains 4 subscales: Word similarities, Word definitions, Recall of digits, and Matrices. Achievement test is measured by the BAS (minus Matrices subtest) along with Friendly Maths Test, Shortened Edinburgh Reading Test, and the Ches Pictorial Language Comprehension Test. Interestingly, the items shown in their appendix are indeed very similar to cognitive test items. Yet all of these tests are treated as achievement rather than cognitive abilities. Since there are 4 BAS subscales and 3 more tests at age 10, the right procedure would be to subject all 7 subtests to a bifactor CFA model, such as in ALMamari & Traynor (2021).

As for the results, however, the achievement-grade correlation (0.379) is barely higher than the IQ-grade correlation (0.338) but still lower than the personality-grade correlation (0.433). Their regression analyses are once again worthless because they use achievement test (an IQ test in disguise) as dependent (outcome) variable rather than independent (predictor) variable.

NLSY79 data

IQ is measured by a multitude of tests collected from school transcript data: California Test of Mental Maturity, Lorge-Thorndike Intelligence Test, Henmon-Nelson Test of Mental Maturity, Kuhlmann-Anderson Intelligence Test, Stanford-Binet Intelligence Scale, and Wechsler Intelligence Scale for Children. These IQ tests have decent correlations with the AFQT, ranging from 0.72 to 0.81 (Herrnstein & Murray, 1994, Appendix 3, p. 584). Achievement is measured by the AFQT battery despite the test being an excellent proxy for cognitive ability, due to its psychometric properties (Jensen, 1998, pp. 237, 391).

As for the results, the IQ-grade correlation (0.464) is lower than the achievement-grade correlation (0.610) but still greater than the personality-grade correlation (0.305). Their regression analyses show that the addition of personality measures (self-esteem and locus of control) slightly improves the prediction of grade once IQ is controlled for. The regression analysis may be less representative (N=823) because school transcript data have small sample sizes while there is almost 12,000 adolescents with an AFQT score. In their appendix (Table 8.1), an interesting finding is that the model with only AFQT as a predictor of log wage income produces an adjusted R² of 0.059, the highest value of all models. Adding personality in fact decreases the adjusted R².

MIDUS data

IQ is measured by the Brief Test of Adult Cognition by Telephone (BTACT), administered in <20 minutes, composed of subtests such as immediate recall and delayed recall, backward digit span, backward counting task, category fluency, number series, and task-switching test. The alpha value (0.82) of the composite score as well as the loadings of all subtests (0.54-0.81) are acceptable (Tun & Lachman, 2006). In the MIDUS sample, the BTACT has good reliability and showed high correlation between the phone and in-person forms (Lachman et al., 2014). The advantage of a phone administration form is that in-person visits may introduce selection bias.

Their regression analysis, which uses the BTACT and the Big Five as predictors of outcomes such as log wage and various measures of health, shows that when the Big Five variables are entered after the IQ variable, the adjusted R² increases noticeably (Tables 8.7-8.11 in the appendix). They also show that if education attainment is further added, the predictive power of IQ declines drastically. The inclusion of education makes little sense if IQ causes education rather than the reverse. One final issue is that the MIDUS is not representative, as it samples predominantly female participants who are well educated (DiBlasio et al., 2021). The predictive validity of IQ certainly has been downwardly biased by range restriction.

Even more problems for Borghans et al. (2016)

One could make the objection that the modest reliability of personality measures would attenuate their predictive power. As noted prior, their IQ measures are bad measures as well. It is still true that the predictive power of either personality or achievement/IQ will improve by the use of latent variable methods, as demonstrated by Lechner et al. (2017). Unfortunately, the use of observed measures, rather than latent measures, is still a common practice.

Zisman & Ganzach (2022) re-analyzed the NLSY79 and the MIDUS data used by Borghans et al. (2016). In the NLSY79, regression results indicate that the AFQT strongly predicts education attainment, GPA and log pay whereas the Big Five variables do not have any relationship with either outcomes (except conscientiousness with wage). The use of AFQT was important as they showed in their Appendix E that the relationship between AFQT with either outcome is noticeably larger than the other IQ tests used by Borghans et al. (2016). In the MIDUS, regression results indicate that the BTACT has a strong relationship with education attainment and modest relationship with log wage whereas the Big Five do not have any relationship with either outcomes (except conscientiousness with wage). They failed to replicate Borghans et al. (2016) in the MIDUS but did not know why the results are strikingly different. Another case of replication crisis.

But Zisman & Ganzach (2022) also analyzed additional data such as the NLSY97, Add Health, WLS, and PIAAC, which contain, respectively, the AFQT, PPVT (a vocabulary test), Henmon-Nelson IQ test, and the adult test of numeracy and literacy. In general, these IQ measures have strong relationship with both education attainment and GPA but modest relationship with log pay. The Big Five variables quite often have standardized beta coefficients close to zero, and typically lower than 0.100. In the NLSY97, conscientiousness has a weak relationship with all outcomes. In the WLS, openness has a weak relationship with all outcomes.

One issue with Zisman & Ganzach (2022) is that their standardized beta regression weights underestimate the predictive power of IQ because they always adjust for socio-economic status, and there is more evidence that IQ causes SES rather than the reverse (Scarr & Weinberg, 1978, Tables 3-5; Weinberg et al., 1992, Table 3; Augustine & Negraia, 2018; Awada & Shelleby, 2021; Marks & O’Connell, 2021; Klein & Kühhirt, 2023). Another problem is the tendency for Ganzach & Zisman (as was the case in Borghans et al. as well) to focus solely on R² which is not a measure of effect size. On the other hand, their results are robust to correction for differential reliabilities (since personality has lower reliability than IQ).

Finally Zisman & Ganzach (2022) observed two conceptual issues. First, the comparison of personality effect versus IQ effect is not straightforward because intelligence is a unitary construct while personality is a multi-faceted construct. Second, intelligence is generally relevant to success in tasks that involve dealing with cognitive complexity, personality as conceptualized by the big-five personality dimensions, is less relevant to education and occupational success, since the effect of personality on behavior in general, and on behavior that leads to success in particular (Shaffer & Postlethwaite, 2012), is to a large extent context dependent.

Results based on averages may be misleading because the predictors’ validities depend on occupation and context. It is correct that different personality indicators have varying impact across occupations (Shaffer & Postlethwaite, 2012; Stankov, 2023) and that the predictive validity of IQ increases with job complexity (Cawley et al., 1996, Tables 7 & 8; Gottfredson, 1997; Lang et al., 2010) and increases as well as years of job experience accumulate (Schmidt & Hunter, 2004, p. 168).

I have compiled years ago indirect evidence that IQ rather than parental SES is the main causal ingredient in life outcomes. One of the best illustration comes from Ganzach’s (2011) application of a dynamic growth model, which reveals that SES affects wages solely by its effect on entry pay whereas IQ affects wage through the slope of wage trajectories. Why it matters is because entry pay concerns young people, and there is ample evidence that wages peak much later in life with most of the wage inequality due to age differences (Sowell, 2016, ch. 13).

Insofar as Borghans et al. (2016) are concerned about differences in intercepts (means and group means), they would be correct as the growths of IQ and achievement test averages don’t necessarily track each other (Jensen, 1973, p. 90; 1998, p. 322). But since their analyses are concerned with slopes (regressions and correlations), they are incorrect as the g-IQ and g-achievement are closely related (Kaufman et al. 2012).

Just like Borghans and Heckman, there are always new, continuous attempts at de-emphasizing the importance of IQ (Duckworth et al., 2007; Richardson & Norgate, 2015). Given the strong consensus and overwhelming research on this subject, it’s no wonder that these clowns receive a proper response sooner or later (Crede et al., 2017; Zisman & Ganzach, 2021; Zimmer & Kirkegaard, 2023).


  1. ALMamari, K., & Traynor, A. (2021). The Role of General and Specific Cognitive Abilities in Predicting Performance of Three Occupations: Evidence from Bifactor Models. Journal of Intelligence, 9(3), 40.
  2. Bardach, L., Hübner, N., Nagengast, B., Trautwein, U., & von Stumm, S. (2023). Personality, intelligence, and academic achievement: Charting their developmental interplay. Journal of Personality, 91(6), 1326–1343.
  3. Borghans, L., Golsteyn, B. H. H., Heckman, J., & Humphries, J. E. (2011). Identification problems in personality psychology. Personality and Individual Differences, 51(3), 315–320.
  4. Borghans, L., Golsteyn, B. H. H., Heckman, J., & Humphries, J. E. (2016). What grades and achievement tests measure. Proceedings of the National Academy of Sciences, 113(47), 13354–13359.
  5. DiBlasio, C. A., Sima, A., Kumar, R. G., Kennedy, R. E., Retnam, R., Lachman, M. E., Novack, T. A., & Dams-O’Connor, K. (2021). Research Letter: Performance of the Brief Test of Adult Cognition by Telephone in a National Sample. Journal of Head Trauma Rehabilitation, 36(4), E233–E239.
  6. Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological methods, 4(3), 272.
  7. Ganzach, Y., & Zisman, C. (2022). Achievement tests and the importance of intelligence and personality in predicting life outcomes. Intelligence, 94, 101679.
  8. Golsteyn, B. H. H., Heckman, J. J., & Humphries, J. E. (2022). Comment on “The claim that personality is more important than intelligence in predicting important life outcomes has been greatly exaggerated.” Intelligence, 94, 101678.
  9. Lachman, M. E., Agrigoroaei, S., Tun, P. A., & Weaver, S. L. (2014). Monitoring Cognitive Functioning: Psychometric Properties of the Brief Test of Adult Cognition by Telephone (BTACT). Assessment, 21(4), 404–417.
  10. Lang, J. W., Kersting, M., Hülsheger, U. R., & Lang, J. (2010). General mental ability, narrower cognitive abilities, and job performance: The perspective of the nested-factors model of cognitive abilities. Personnel Psychology, 63(3), 595–640.
  11. Lechner, C., Danner, D., & Rammstedt, B. (2017). How is personality related to intelligence and achievement? A replication and extension of Borghans et al. and Salkever. Personality and Individual Differences, 111, 86–91.
  12. Poropat, A. E. (2009). A meta-analysis of the five-factor model of personality and academic performance. Psychological bulletin, 135(2), 322.
  13. Salkever, D. (2015). Interpreting the NLSY79 empirical data on “IQ” and “achievement”: A comment on Borghans et al.’s “Identification problems in personality psychology.” Personality and Individual Differences, 85, 66–68.
  14. Shaffer, J. A., & Postlethwaite, B. E. (2012). A Matter of Context: A Meta‐Analytic Investigation of the Relative Validity of Contextualized and Noncontextualized Personality Measures. Personnel Psychology, 65(3), 445–494.
  15. Schmidt, F. L., & Hunter, J. (2004). General Mental Ability in the World of Work: Occupational Attainment and Job Performance. Journal of Personality and Social Psychology, 86(1), 162–173.
  16. Stankov, L. (2023). Intelligence, Personality, and the Prediction of Life Outcomes: Borghans et al. (2016) vs. Zisman and Ganzach (2022) Debate. Journal of Intelligence, 11(5), 95.
  17. te Nijenhuis, J., Evers, A., & Mur, J. P. (2000). The validity of the Differential Aptitude Test for the assessment of immigrant children. Educational Psychology, 20, 99–115.
  18. Tun, P. A., & Lachman, M. E. (2006). Telephone assessment of cognitive function in adulthood: the Brief Test of Adult Cognition by Telephone. Age and Ageing, 35(6), 629–632.
  19. Zisman, C., & Ganzach, Y. (2021). In a Representative Sample Grit Has a Negligible Effect on Educational and Economic Success Compared to Intelligence. Social Psychological and Personality Science, 12(3), 296–303.
  20. Zisman, C., & Ganzach, Y. (2022). The claim that personality is more important than intelligence in predicting important life outcomes has been greatly exaggerated. Intelligence, 92, 101631.