IQ and psychometrics
- On the Continued Misinterpretation of Stereotype Threat as Accounting for Black-White Differences on Cognitive Tests by Tomeh & Sackett. A common misconception about stereotype threat, and a major reason for the popularity of the idea, is that in the absence of threat in the testing situation, the black-white IQ gap is eliminated. This is of course not the case but rather the experimental activation of stereotypes has (sometimes) been found to make the black-white gap larger than it normally is. In an analysis of early writings on stereotype threat, Sackett et al. (2004) reported that this misinterpretation was found in the majority of journal articles, textbooks, and popular press articles discussing the effect. In the new article, Tomeh and Sackett find that more recent textbooks and journal articles are still about equally likely to misinterpret stereotype threat in this way as to describe it correctly. I had hoped that the large multi-lab study of the effect would have put the whole idea to bed by now, but that study has unfortunately been delayed.
- Invariance: What Does Measurement Invariance Allow us to Claim? by John Protzko. In this study people were randomized to complete either a scale aiming to measure “search for meaning in life”, or an altered nonsense version of the same scale where the words “meaning” and “purpose” had been replaced with the word “gavagai”. The respondents indicated their level of agreement or disagreement with statements such as “I am searching for meaning/gavagai in my life”. Both groups also completed an unaltered “free will” scale, and confirmatory factor models where a single factor underlay the “meaning/gavagai” items while another factor underlay the “free will” items were estimated. The two groups showed not only configural but also metric and scalar invariance for these factors. Given the usual interpretation of factorial invariance in psychometrics, this would suggest that the mean difference observed between the two groups on the “meaning/gavagai” scale reflects a mean difference on a particular latent construct. The data used were made available online, and I was able replicate the finding of configural, metric, and scalar invariance, given the ΔCFI/RMSEA criteria (strict invariance was not supported). The paradox appears to stem from the fact that individual differences on the “meaning in life” scale mostly reflect the wording and format of the items as well as response styles rather than tapping into a specific latent attitude which may not even exist, given the vagueness of the “meaning in life” scale. I found that I could move from scalar invariance to a more constrained model where all of the “meaning/gavagai” items had the same values for loadings and intercepts without worsening the model fit. So it seems that all the items were measuring the same thing (or things) but what that is is not apparent from a surface analysis of the items. Jordan Lasker has written a long response to Protzko, taking issue with the idea that two scales can have the same meaning without strict invariance as well as with the specific fit indices used. While I agree that strict invariance should always be pursued, Protzko’s discovery of scalar invariance using the conventional fit criteria is nevertheless interesting and requires an explanation. I think Lasker also makes a mistake in his analysis by setting the variances of the “meaning in life/gavagai” factors both to 1 even though this is not a constraint required for any level of factorial invariance. The extraneous constraint distorts his loadings estimates.
- Effort impacts IQ test scores in a minor way: A multi-study investigation with healthy adult volunteers by Bates & Gignac. In three experiments (total N = 1201), adult participants first took a short spatial ability test (like this one) and were randomly assigned either to a treatment group or to a control group. Both groups then completed another version of the same test, with the treatment group participants promised a monetary reward if they improved their score by at least 10%. The effect of the incentives on test scores was small, d = 0.166, corresponding to 2.5 points on a standard IQ scale. This suggests that the effect size of d = 0.64 (or 9.6 points) reported in the meta-analysis by Duckworth et al. is strongly upwardly biased, as has been suspected. A limitation of the study is that the incentives were small, £10 at most. However, the participants were recruited through a crowdsourcing website and paid £1.20 for their participation (excluding the incentive bonuses), so it is possible that the rewards were substantial to them. Nevertheless, I would have liked to see if a genuinely large reward had a larger effect. Bates & Gignac also conducted a series of big observational studies (total N = 3007) where the correlation between test performance and a self-report measure of test motivation was 0.28. However, this correlation is ambiguous because self-reported motivation may be related to how easy or hard the respondent finds the test.
- The Coin Flip by Spotted Toad. This is an illuminating commentary on the Tennessee Pre-K study (on which I commented here) and the difficulty of causal inference in long-term experiments.
- Do Meta-Analyses Oversell the Longer-Term Effects of Programs? Part 1 & Part 2 by Bailey & Weiss. This analysis found that in a meta-analytic sample of postsecondary education RCTs seeking to improve student outcomes, trials that reported larger initial effects were more likely to have long-term follow-up data collected and published. While this could be innocuous, with more effective interventions being selected for further study, it could also simply mean that studies more biased to the positive direction by sampling error were selected. So when you see a study touting the long-term benefits of some educational intervention, keep in mind that the sample may have been followed up only because the initial results were more promising than in other samples subjected to the same or similar interventions.
- An Anatomy of the Intergenerational Correlation of Educational Attainment -Learning from the Educational Attainments of Norwegian Twins and their Children by Baier et al. Using Norwegian register data on the educational attainment of twins and their children, this study finds that the intergenerational correlation for education is entirely genetically mediated in Norway. The heritability of education was about 60 percent is both parents and children, while the shared environmental variance was 16% in parents and only 2% in children. This indicates that the shared environment is much less important for educational attainment in Norway than elsewhere (cf., Silventoinen et al., 2020), although this is partly a function of how assortative mating modeled.