Racial Differences on Digit Span Tests

In digit span tests, the respondents are asked to repeat a string of digits. There are two variants of the test, forward digit span (FDS) and backward digit span (BDS). In FDS, the digits are repeated in the order of their presentation, while in BDS they must be repeated in the reverse order. The largest number of digits that a person can repeat without error is his or her forward or backward digit span.

It is well-established that the black-white gap is substantially larger on BDS than FSD (see references in The g Factor by Jensen, p. 405, Note 22; see also my recent analysis of the DAS-II). However, replication is always good, so I analyzed black-white differences in the CNLSY sample, which contains FDS and BDS scores for relatively large samples of black and white children. Additionally, I compared the digit span performance of Hispanic American children to that of blacks and whites.


The NLSY79, famously analyzed in The Bell Curve by Herrnstein and Murray, is a longitudinal study of more than 12,000 Americans who were 14-22 years old in 1979 when the study began. There is also a related longitudinal study called the CNLSY (or NLSY79 Children and Young Adults) which surveys the children of the female participants of the NLSY79. The researchers have estimated that about 95 percent of the children of the NLSY79 women are enrolled in the CNLSY. Most of them were born in the 1980s and 1990s. All data in the current investigation are from the CNLSY.

The CNLSY participants have taken a number of cognitive tests. Among these are FDS and BDS which were given to children aged about 6 to 12 years. The analyses that follow will present results from digit span tests divided into three age bands: 6-8, 8-10, and 10-12 years. For shortness, I will refer to these groups as the 7-year-olds, 9-year-olds, and 11-year-olds, respectively. The tests were administered between 1986 and 2010.

The original NLSY79 consists of a nationally representative cross-sectional sample and several oversamples of, for example, blacks and Hispanics. Included in my analysis are only the children of those females who are members of the representative cross-sectional sample. This should make the samples I use more or less representative. The participants completed the FDS and BDS tests one or more times over several years, so each child may be included in one, two, or three different age bands. The fact that not all of the CNLSY children were tested on each of the biennial assessment rounds adds some, probably random, error to the parameter estimates.

The race or ethnicity of each child was assumed to be the same as that of his or her mother. This means that mixed-ancestry individuals add some noise to the results, but this effect is minor given the low rate of interracial mating.

The CNLSY uses three race categories: black, Hispanic, and non-black non-Hispanic. This means that the sample that I refer to as ‘whites’ includes a small number of non-whites, mostly Asians and Native Americans.


The table below shows the raw mean scores, standard deviations, and sample sizes for each race/ethnicity, test, and age group:

h-b-w ds results

The next table shows the gaps (Cohen’s d’s using pooled SDs) between whites, blacks, and Hispanics. If the gap is positive, the first mentioned group in the pair outscored the second group, and vice versa. Statistically significant differences between groups, assessed using two-tailed t-tests with no assumption of equal variances, are marked with asterisks.

CNLSY digit span racial:ethnic gaps

The results indicate that the black-white gap is minimal and non-significant on FDS for ages 7 and 9, while it is somewhat larger and significant at age 11. On BDS, the black-white difference is substantially larger than on FDS at all ages and is always highly significant. The white-Hispanic gaps are substantial and significant on both tests across age groups, but the BDS gap is only about 50 percent as large as the FDS gap at ages 9 and 11. As to black-Hispanic differences, Hispanics outscore blacks on BDS at each age, while on FDS blacks have a corresponding advantage; all six Hispanic-black gaps show the same pattern, although the differences are not significant for two of the comparisons.


That the black-white gap on FSD is substantially smaller than on BDS is a robust finding confirmed in this new analysis. This poses a challenge to the argument that racial differences in exposure to the kinds of information that are needed in cognitive tests cause the black-white test score gap. The informational demands of the digit span tests are minimal, as only the knowledge of numbers from 1 to 9 is required. FDS is a simple memory test assessing the ability to store information and immediately recall it. The informational demands of BDS are the same as those of FDS, but the requirement that the digits be repeated in the reverse order means that it is not simply a memory test but one that also requires mental transformation or manipulation of the information presented.

The interaction between race, FDS, and BDS might be explained in terms of Spearman’s hypothesis. The reason why the black-white gap on FDS is smaller than on BDS would then be that the latter is a better measure of g. To test this hypothesis with the present data, I factor analyzed (using principal axis factoring) the digit span tests together with the PPVT vocabulary test and the PIAT achievement tests that were also administered to the CNLSY participants. I used standard test scores from six tests (FDS, BDS, PIAT-M, PIAT-RR, PIAT-RC, and PPVT) taken at age 11, which is the age with the largest sample sizes for most tests. Using the eigenvalue>1 standard, a single factor, interpreted as g here, explained all the communality among tests in all three groups. g loadings are shown below.


CNLSY g loadings for whites


CNLSY g loadings for blacks


CNLSY g loadings for Hispanics

For the white and black samples, BDS is a better measure of g than FDS, though not greatly so. For Hispanics, FDS interestingly has a higher g loading than BDS. This factor analysis is limited by the smallness of the test battery and its bias towards crystallized abilities, but one may conclude that at least in the CNLSY sample specific memory abilities appear to better explain the patterns of racial/ethnic differences on FDS and BDS than g does. Blacks appear to have strong short-term memory skills in relation to their rather low mean level of g, whereas short-term memory represents a specific weakness for Hispanics.

The small white-black gap on FDS compared to BDS might therefore mainly be explained not by differences in g saturation between the tests but by a black short-term memory advantage. Specifically, on FDS the black g disadvantage is offset by relatively superior memory skills, but on BDS such skills are not enough. Similarly, the weak memory skills of Hispanics hamper them on FDS, producing a gap in favor of blacks, but on BDS the somewhat greater mean g level of Hispanics pulls them ahead of blacks. A potential explanation of the particular findings on Hispanics might be that Native Americans have relatively poor short-term memory skills, and that the degree of white admixture in Hispanics is associated with higher g and, independently of g, better short-term memory. This could explain both the Hispanic underperformance on FDS in relation to both blacks and whites and the relatively high g loading of FDS among Hispanics.


  1. Marshall Dermer

    What about differential motivation as a function of group membership?

  2. Penguin monkey

    Reverse digit span is used as an IQ test, but I noticed that my ability to do well on it is due to my ability to “group” numbers. Instead of memorizing 2 – 5 – 6, I can remember 256 or 49 or 25 and do better on the test. Grouping numbers improves the more you deal with numbers which is higher for whites/asians than blacks/aboriginals. Also blacks may be less motivated as the person above said, so maybe we can find a way of improving motivation.

    Instead of being digits how about changing the categories to be fruits, colors, or objects which are less culturally loaded since people the people have to figure out a way form groups without having memorized them before hand.

    • Dalliard

      It’s well know that “chunking” increases the digit span, but it works for both forward and backward span, so it can hardly explain differences between the two.

      The DAS-II test, about which I wrote here, includes several memory tests with pictorial or verbal content. Black-white gaps are similar to the backward digit span gap in all of them, indicating that the kind of stimuli used (numbers, pictures, words…) makes little difference.

      The only subtest with no black-white gap is the Speed of Information Processing where the test taker “scans rows of figures or numbers and marks the figure with the most parts or the greatest number in each row. The score is based on speed. Accuracy does not count unless it is very poor.” This test known to have close to a zero g loading and is sometimes regarded as a manual dexterity test rather than a cognitive test (you can increase its g loading by scoring it based on accuracy). It is also sometimes used as a test of motivation or conscientiousness, given how boring and mechanical a test it is.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Human Varieties

Theme by Anders NorenUp ↑