What happened with the Abecedarian study ? IQ-malleability theories in danger.

In an attempt to equalize social opportunities, several large-scale studies have been launched. These studies were of special interest because they sampled a large portion of black people, since a study on white people can’t be generalized over other ethnic minorities. Among those projects, the Abecedarian (ABC) has the particularity to have generated conflicting interpretations. This needs to be discussed thoroughly.

Firstly, an introduction of the literature may be needed. Spitz (1986) and Herrnstein & Murray (1994, pp. 403-409) reviewed the most popular programs, e.g., Head Start, Perry Preschool (PPP), Milwaukee, and Abecedarian (ABC). They conclude for the first two ones that the effects revealed very soon to be large but not long-lasting, even when the experimental (E) groups benefit from having medical care, psychological services, parental involvement and low child-teacher ratio (6:1 for the PPP). Globally, the Consortium preschool projects reported that the median IQ gain diminishes the years after the end of the programs. This is in accordance with current meta-analyses (Camilli et al., 2010, Tables 5 & 6; Leak et al., 2010, Table 4 & Figure 6). For example, Camilli (2010) reported an effect size (ES) of 0.231 and 0.067 for T-C and T-A comparison (respectively, treatment-control and treatment-alternative treatment group comparisons) whereas the ES declines by 0.241 per follow-up period. The ES for high-quality design studies (ES=0.27) is only slightly larger than the low-quality design studies (ES=0.23). Leak also reported consistent negative declines during the years after end of program.

The Milwaukee has some interesting features, first because of the sample comprising low-IQ black families with black mothers having IQ<75, and second because of the final outcome. The babies either benefit from daycare starting at 3 months until school or no daycare. The mothers participated actively. There were 18 (C) controls and 17 (E) experimentals. Soon after the beginning of the project, IQ improved by no more than 25 points, and by age 12 and 14, the gains fade out, with an advantage of only 10 points. Although 10 points seem substantial, the gain has not been accompanied with improvement in school performance, compared to the controls. Locurto and Jensen had concluded that they have been coached so well on taking IQ tests that their scores no longer measure g and no longer predict future social outcomes. Jensen (1998, pp. 338-342) gives some other details. At each subsequent year, the IQ tests’ contents became less and less similar to what was taught in the stimulation center. The consequence is that the transfer of training effect gradually diminishes over time, because the later scores come closer to their true level of g.

Currie & Thomas (1995, Tables 4 & 6) provide a comprehensive summary of the Head Start (HS) participants in the NLSY Child-Mother survey data (N~2319 whites and N~1158 blacks). The program design was to examine a sibling attending Head Start versus none for the other sibling, in order to control for family background effects, and compare the outcome with that of a sibling attending other preschool(s) versus none for the other sibling. The assumption that parents could favor the children attending HS or another preschool seems not to be supported by the data. They estimate (see Table 4), using regressions, unadjusted (model 1) and adjusted for observable mother and child characteristics (model 2) and adjusted for unobservable mother characteristics (model 3), successively, that white children gained 5.875 percentile points when attending HS (versus none) in model 3 whereas the black gain was 0.247 percentile point in model 3. Their Table 6 which includes an age*gain interaction variable displays a gain of 6.878 and 6.845 for whites and blacks, respectively. This suggests the absence of black gain in Table 4 was due to the omission of age*gain interaction variable. Indeed, the program has produced, from age 5 to age 10, a final 6-point gain among whites whereas the initially 7-point gain has almost vanished among blacks, even though blacks mothers in this sample have higher educational levels than the white mothers. The interaction between Head Start and age with percentile-PPVT as dependent variable (Table 6) reveals a large coefficient for blacks (who lost 1.278 percentile point per year) but small for whites (who lost only 0.192 percentile point per year). To test the hypothesis that the quick vanishing gain among blacks may be due to environment quality at home, they include a triple interaction between Head Start, child age and maternal AFQT which amounts to only -0.04.

But Currie & Thomas (1998) later argue it was probably because blacks attended poorer schools, although, interestingly, they demonstrate that blacks and whites receive education of similar quality when they went to the same school, meaning that the “within-school segregation” hypothesis has no empirical basis. Yet, Ferguson (1998, pp. 324-325) expresses some reservations regarding their interpretation of black fade out, “Siblings who are close in age are very likely to attend the same schools. For school quality to explain why Head Start advantages fade for black children but not for whites, one would need to show that inferior schools reduce the difference between higher and lower achieving siblings by depressing high achievement disproportionately.” (p. 325). Furthermore, as McKey et al. (1985, Figures 3-15, 3-17, 3-21) summarized, the Head Start gains vanish after the end of the program. Even if blacks attended better schools, they will probably end up like the white children with no lasting IQ gains. Another piece of evidence of non-g effect is supplied by Jongeneel-Grimen et al. (2007) yet unpublished meta-analysis showing a negative relationship between subtest IQ gains induced by Head Start programs and subtest g-loadings (studies=8, total N=602, rho=-0.72, %VE=71).

Barnett (1995, Table 1) provides a summary table with the main outcome of the existing studies. Most of them display sizeable IQ gains but the follow-up didn’t seem to extend very far. The Early Training Project, which enrolled 4 and 5-year-old blacks, extends to age 17, follow-up N=36 for E group and follow-up N=16 for C group, but the IQ difference was 2.3. However, the initial IQs and their trajectories suggest non lasting gains (Spitz, 1986, pp. 95-97). From, Besharov et al. (2011) we see that the experimentals already outscored the controls by 3 points during the pretest, 89 vs 86 at age ~4, and these groups end up with scores of 88 vs 85 at age 10, and 79 vs 76 at age 17. They also report some suspicious numbers. The high school graduation rates among females, at age 21, were 85% and 50% for the E and C groups. But among males, the rates were virtually identical. This calls into question the effectiveness of the program. Next is the Perry Preschool Program (PPP) which targeted 123 africans born in poverty and having high risk of failing in school, with IQs between 70 and 85. The PPP had a follow-up IQ for children aged 14, both E group (initial and follow-up N=58) and C group (initial and follow-up N=65) scored 81 points. The PPP shares with all other programs the same features, i.e., large quick gains following by a dramatic fall (Spitz, 1986, pp. 101-108). Besharov (2011), interestingly, explains that the staff project attempted to introduce Piagetian principles into the curriculum and was among the “model” programs, those being the most longitudinally intensive and funded. The total absence of IQ gain in the PPP can be contrasted with the (almost) cumulative achievement score gains of the E group versus C group, as shown in Ferguson (1998, p. 321) from age 7 to age 19. Some made the claim that this curious pattern can be due to attrition effect if the subjects dropped the survey study non randomly, but the longitudinal loss (or attrition) of the subjects was extremely low in fact.

The IHDP or Infant Health and Development Program is also another one of these “model” programs. The sample is composed of 52% blacks, 9% hispanics and 39% whites and others. The total number of children is 453. The study targets low birth weight infants (LBW) less than 2500 grams, with Mean=1797 g and SD=458 g. Two thirds of the sample have <2000 grams, the lighter LBW, when one third was between 2000 and 2500 grams, the heavier LBW. The global picture (Besharov et al., 2011) is that, by age 3, IHDP produced a 10-point IQ gain for the experimentals. By age 8, in the full sample, both groups have IQ=91. But at age 18, the controls have IQ=91 and the experimentals IQ=90. Among the heavier LBW, the experimentals have a 2-point advantage. This suggests that the heavier LBWs benefit more from the treatment. Specifically, Baumeister & Bacharach (1996) report that “The difference between the intervention and follow-up children in the “heavier” (2001-2500 g) stratum was 13.2 IQ points, ES=.83, N=347, on the Stanford-Binet; for the “lighter” (< 2000 g) the difference was only 6.6 points, ES=.41, N=561. Mean IQs for babies 1500 g and below were 85.3 and 80.2 for the intervention (N=82) and follow-up groups (N=150), respectively”. However, the authors also use invalid arguments. For example, in predicting children IQ (dependent var.) they enter mother IQ, intervention and birth weight as independent (predictor) variables. The R² was 0.31 for this 3-variables model but falls to 0.07 when mother IQ is removed. The problem is that the R² is defectuous as approximating effect size (Sackett et al., 2008, p. 216; Jensen, 1980, pp. 306-307).

Even if the IHDP provides more evidence that day care participation does not improve IQ, Baumeister & Bacharach (1996) speculate that LBWs suffer deficiencies that could be better dealt with through biological-related means. They cite a study showing that women with bacterial vaginosis diagnosed during the second trimester of pregnancy were 40 percent more likely to give birth to a premature infant (preterm birth is a potential cause of LBW). The problem being that even bacterial vaginosis seems to be influenced by genetic factors as well (Ryckman et al., 2009; Bream et al., 2013). Although LBW correlates with mother’s younger age (Jensen, 1998, pp. 504-505), there exists further evidence of genetic factors involved in LBW or preterm birth (Anum et al., 2009). Besides, medical intervention does not seem to produce miracles (Spitz, 1986, pp. 155-172).

A criticism sometimes made against these programs is that they weren’t intensive and expensive enough (except the Milwaukee). The argument is dubious because the absence of long-lasting IQ gains never prevented the treated children to improve substantially in their academic performance. In general, they end up with less behavioral problems and with higher SES levels. Either this means education is not g-loaded or that IQ has more genetic component, which is why it is less amenable to stimulation. In both cases, the interpretation would be that what factors will improve economic success will not necessarily improve IQ by the same extent. More importantly, Leak et al. (2010, Table 4) meta-analyzed 117 studies and draw the conclusion from regression models including indicators of study quality (with independent variables such as, starting age of treatment, length of treatment, time of measures, measurement method, passive vs active control group, attrition, etc.) that the effect sizes don’t show a clear-cut correlation (weighted β=-0.16; unweighted β=0.22) when they compare study length between half a year and 1 year against study length longer than 2 years. If one believes that earlier age at entrance into the program induces large ES, Leak reports only a modest effect size (β=0.09) when comparing studies with children older than 4 years old against less than 3 years old.

Despite this global picture, many analysts (e.g., Nisbett, 2009, pp. 126-129) claimed that the Abecedarian illustrates the proof that intensive training can definitely influence IQ. The Abecedarian (ABC) indeed was the bone of contention of many researchers. Some claim it does not raise IQ (Herrnstein & Murray, 1994; Spitz; 1992, 1993; Brody, 2008) whereas the majority see it as very successful, resulting in an IQ gain of 4.65 points at age 12 and 4.40 points at age 21 (Jensen, 1998; Ramey et al., 1984, 2000; Ramey, 1992; Burchinal et al., 1997; Campbell & Burchinal, 2008).

Ramey (1992) provided summary of this project. It sampled only africans, almost poor children at high-risk (familially speaking, not socioculturally speaking). This might explain why the black mothers have IQs (mean=84) quite comparable to that of the black population in the US. The infants and families were provided with all the amenities they needed, e.g., free nutritional supplements in the form of unlimited iron-fortified formula, pediatric surveillance, and family support social work services. Only the experimental infants attended an educational day-care center. Within the child development center, the child/teacher ratio was low (never exceeded 6/1). Both groups received preschool intervention (birth to age 5). After this, the experimentals receive a special educational program between age 5 and 8 whereas the controls don’t have it. Ramey (1992, pp. 246-247) describes as follows :

The schoolage intervention consisted of two main emphases. The first involved having master teachers work with the child’s classroom teacher to maximize individualization and developmental appropriateness of educational experiences and to provide consistent support and encouragement to facilitate positive developmental continuity in the child’s introduction to elementary school. Second, these master teachers visited parents at home on a bimonthly basis and discussed with parents specific ways they could help their children on a daily basis to succeed in school. Based on an analysis of upcoming classroom topics and events, the master teachers devised, in real time and over the years, literally thousands of specific activities and guides that parents could use in a gamelike manner to give their children extra practice and support in activities that were directly related to classroom performance. These specific activities were discussed with parents on a regular basis with an emphasis on the psychological principles that undergirded the specific activities. This program was premised on the assumption that parents, most of whom themselves had not been successful in formal schooling, would likely profit from specific and enjoyable ways to facilitate their children’s progress in school.

The design of the study is best illustrated as follows :

Burchinal 1997 (Figure 2)

Jensen (1981) was probably among the first to comment this ambitious study. But he reports unexpected numbers. The mother-child (aged 3 years) correlation was about 0.43 in the control (C) and -0.05 in the experimental (E) group. The r=0.43 falls within the range of the most usual reported mother-child correlations in the literature as a whole, even when the children had no postnatal contacts with the biological parents. Jensen put several possibilities to the test. Using Spearman rho yields the same null correlation, and this means outlier as the explanation should be ruled out. The reliability of the Standard Binet IQ (administered at this age) was about r=0.58 and rho=0.64, between 2 and 3 years-old, and so this must also be ruled out as plausible explanation. The standard deviations of either mother’s IQ or children’s IQ were roughly similar in the C and E groups, so range restriction was not plausible candidate. So what happened ? The question is important because such outcome for the E group undermines the construct validity of the Standard-Binet in the E group. Given this, Jensen speculates that “A failure to demonstrate a significant parent-child correlation, especially when there is considerable uniformity of environmental treatment for all the children, raises the question of whether the IQ test is actually a measure of the general intelligence factor. The E group’s acquisition of certain overlearned skills which transfer to easy Binet test items may depend on what I term Level I, or rote-learning ability, that is largely independent of Level II ability, or g.” (p. 39). It seems here, like in the Milwaukee, that the IQs of the treatment and control groups don’t have the same meaning.

The central point of Jensen’s idea seems to be that the distorsion of mother-child correlation reflects the random and non-uniform impact of the program despite the fact of it being uniformly proceeded, thus explaining the distorsion of the rank-ordering of child IQ scores and consequently lowering the mother-child resemblance. If true, the score gain may not necessarily be attributed to the program. Obviously, one would believe that environments can’t be expected to exert similar effects on persons who select and shape their own environment (i.e., rGE effects) but in the case of children aged only 3 years-old, this explanation appears unlikely. One would wonder whether the treatment exerts more effect on children of lower-IQ mothers. That seems to be the case, given Ramey et al. (1984) Table 5. When children IQ is being predicted by treatment group (i.e., effect), maternal IQ, home environment, and then by adding treatment_group*maternal_IQ interaction and treatment_group*home_environment interaction, the regression weights were negative, at -0.12, -0.44, -0.57, and -0.43 for child IQ at 12, 24, 36, and 48 months, respectively. Negative correlations must be here interpreted as to say that when mother IQ increases, the treatment effect (on child IQ) decreases. Ramey et al. dismiss the sizeable impact of mother IQ as moderator because the R² increases by only 0.01 when the interaction terms are added in the model (initially) without interactions. Unfortunately, the R² is not effective as indexing the impact of the model variables. For example, in my forthcoming post on DIF analysis using logistic regression in the Wordsum GSS, the Exp(B) or B coefficients for group and group*score variables can be sizeable and yet the R² increment being meaningless while ICC graphs show sizeable DIFs. I found that even an R² increment of 0.01 is meaningful.

Why this outcome is of particular interest is best illustrated by Jensen (1969) who wrote in this respect “To be sure, genetic factors become more important at the extremes. Some minimal level of ability is required for learning most skills. But while you can teach almost anyone to play chess, or the piano, or to conduct an orchestra, or to write prose, you cannot teach everyone to be a Capablanca, a Paderewski, a Toscanini, or a Bernard Shaw. In a society that values and rewards individual talent and merit, genetic factors inevitably take on considerable importance.” (p. 76). Given the relationship between teachability and low ability levels, the Ramey’s study suggests that educational stimulus succeeded only in improving the basic skills. If teachability decreases with age because of the higher complexity involved at later educational stages, this once again implies expectance of future fade out. But of course, if the treatment supplied nutrition to (unhealthy) low-IQ children, the conclusion could have been quite different (however, see, Spitz, 1986, pp. 155-172; Metzen, 2012, ch. 5).

As expected, Ramey & Haskins (1981) replied to Jensen (1981) comment. They haven’t commented much about Jensen’s most critical point however. Their interpretation is just different : “the intervention program not only affected mean levels of IQ performance, but also disrupted the frequently reported mother-child IQ correlation of .5.” (p. 42). But they do show the mother-child correlations at 2 years of age, being 0.31 and 0.41 for the E and C groups, and at 4 years being 0.14 and 0.44 for the E and C groups. It seems, after age 2, the mother-child correlation starts to drop substantially in the experimental group. The authors dismiss the fact that E group correlations are inconsistent with the genetic models because, they say, these correlations are not significantly different from either 0.5 or 0.0. But Chi-square based statistics depend on sample size, as power increases with N. The hypothesis of absence/presence of a non-zero difference is irrelevant if the magnitude of the non-zero difference is small.

Regardless, Ramey & Haskins (1981) report some other findings. They don’t believe at this earlier ages that IQ gains were due to practice effects because the infants who received more administration of the Bayley did not perform better than those who receive less, and the same was true for the Binet IQ (at 3 years-old). Furthermore, they note that there was probably little overlap between items on the Binet through age 3 and curriculum activities to which the infants were exposed. However, Spitz (1992) noted that the infants tested with their mothers participating have higher scores on the Bayley MDI (as much as 10 to 15 points). If so, the presence of a large effect size (~18 points) starting from 18 months can be mostly attributed to some teaching to the test effects, known to be devoid of g.

Early Learning and School Readiness - Can Early Intervention Make a Difference (Ramey & Ramey 2004 Figure 5)

Later, it seems that Jensen (1998, pp. 343-344) has changed his mind about the ABC study, insofar as he was open to the possibility of g gains, because Piagetian IQ, a very culture-free test, for the treatment group was superior (by an equivalent of 5 IQ points) than that of the controls and because their educational achievement was also superior. Regarding the former, there remains the possibility that the Piagetian gain was simply due to familiarity effects owing to repeated cognitive activities and learned materials that resemble the Piaget, especially if Spitz’s (1991) suspicion is verified, “Finally, it is just about impossible for intervention programs not to “teach to the test,” or to the experimental learning and performance tasks, because there are limited kinds of materials and mental and physical challenges that are testable at early ages” (p. 332). Indeed, as Campbell & Ramey (1990) recognize, the Piaget can be trained, like all other IQ tests. And we must also remember that this IQ advantage was already present at 6 months of age. As for the latter, Jensen gives no proof that educational gain is due to g-related factors rather than non-g factors.

Burchinal et al. (1997) use hierarchical regressions to model growth curves for experimental groups and the control group, in combining the samples of the project CARE and ABC. Adding mother’s IQ (in model 2) does not change the interaction terms of treatment*age, treatment*age^2, and treatment*age^3 (linear, quadratic, and cubic functions, respectively) on children IQ at 4 years-old. The conclusion about the stronger IQ stability over time (between 1st year and 7th year-old children) induced by treatment effect is well supported. The child care + home-school (CC+HS; N=38) and child care only (CC; N=24) demonstrate greater stability than the control group. On the other hand, home visit + home-school resource teacher (HV+HS; N=24) and home-school resource teacher only (HS, N=20) demonstrate no better stability and do not help these groups to longitudinally improve their scores over the control group.

If one may suspect that interventions harm mother-child interaction or attachment, Ramey et al. (2000) report that studies generally do not show this pattern. With regard to the ABC, the mothers whose children had been enrolled in the treatment group were more likely to have post high school educational attainment and to be employed. This may reinforce the idea that mothers in the treatment and control groups were initially different in their outcome (e.g., mothers in the treatment have 1.3 IQ points greater than mothers in the controls). But it may also be that the enrollment in childhood day care itself has induced these mothers to be motivated in obtaining higher grades, through their involvement in the educational process.

Spitz (1992, 1993) reveals that 4 cohorts have been recruited over a 5-year period, and that the IQ loss, from 12 to 60 months of age, among cohorts 1 & 2 aggregated was larger for the controls (-14.98) than for the experimentals (-7.40). However, among cohorts 3 & 4 aggregated, the IQ loss for the controls was smaller (-8.55) than the loss occured in the experimentals (-12.23). Such outcome seems to imply that the apparent positive impact occured in the first 2 cohorts can be due to pure chance (e.g., sampling error). In the 3-month period from 3 to 6 months of age the experimentals gained 11.73 points whereas the controls gained 6.3 points, leaving a difference of 5.43 points. At first glance, this IQ gain would be attributed to day care. But because an additional 4 years of daily care produced very little effects (1.62 points) whereas the first earliest months of care produced a 5.43-point IQ difference at 6 months of age, the most likely explanation would be that the initial IQ difference was due to pre-existing differences before the program even started. Spitz’s reasoning is that the intervention couldn’t have suddenly stopped making its effect just after only 6 months of age. Why a sudden jump rather than a progressive gain ? Even Ramey et al. (1984) may not believe that the gain occurred between 3- and 6-months could be entirely due to treatment effects, as they write, “McCall (1981) argued that “mental development is highly canalized during the first 18-24 months of life but thereafter becomes less canalized” (p. 5). As a consequence, McCall (1981) contended, the two aspects of mental development would have certain characteristics. Because of the early canalization, developmental functions are less malleable during early infancy than somewhat later. … This finding, when considered in light of McCall’s (1981) argument that early mental development is highly canalized, suggests that intervention programs for initially healthy children might be more beneficial during early childhood than during early infancy.” (p. 1922). To illustrate, McCall (1981) stated “Developmental researchers might observe a correlation between early parental language and later child intelligence and be prone to infer that talking to a 6-month-old infant improves the child’s intelligence at 3 years of age. However, language stimulation at 6 months may actually have no effect on the child, but it is the same parent who talks to the 6-month-old who also talks to the 3-year-old – at which time such language stimulation actually does have a causal effect.” (p. 8).

The absence of large effects attributed to day care can be better understood by examining the IHDP, a study inspired by the ABC and aimed to replicate it. Baumeister & Bacharach (1996, see Table 2 & Figure 2) show that within a period of 12 months of age and 36 months of age, there is no clear relationship between day care and cognitive improvement. The children attending more don’t systematically benefit more compared to children attending less. Moreover, in a multiple regression where children IQ is being predicted, the inclusion of parent meetings, home visitations, day care visits, as predictors, produces Beta coefficients of 0.19, 0.01, and 0.04, respectively. There is no clear evidence that day care center improves children IQ (note: the large effect of meetings seems to be anomalous, as they argue, and must be mediated by higher IQ mothers since they attended these meetings more frequently). In any case, Herrnstein & Murray (1994) take another approach and argue that at this early state of age, IQ cannot be highly reliable enough to be predictively meaningful. More precisely, prior to 2 years of age (Spitz, 1992, p. 232). This is illustrated by the fact that the IQ difference vanished at 9 months but rebounds again at 6-point difference at 12 months. In other words, even when the scores are subjected to large measurement errors (see, Hunter & Schmidt, 2004, pp. 344-346) the effect size tends to be positive. Although given a 3- to 12-months period, the average difference is 3 points, this is overly underestimated due to large measurement errors. Starting from 18 months, the difference seems to stabilize.

Besharov et al. (2011) questioned the random assignment of the study. Attrition could have inflated the treatment effect :

Most articles published by the UNC team describe the sample as having 111 children, 57 randomly assigned to the preschool program group and 54 to preschool control group. In fact, 122 children were originally assigned to the research sample, but 11 dropped out before participation began. [30] If such attrition were random, this would not have been a serious problem, but this does not appear to have been the case. Of the eight mothers that refused to participate, seven were in the program group and one was in the control group. According to the UNC team, “The higher rate of rejection by families offered the preschool treatment was generally related to mothers wanting to care for infants in the home.” [31] Simply dropping these cases undermined the integrity of random assignment. [32] As Robert St.Pierre, former vice president and principal associate at Abt Associates Inc., notes:

If the cases had been kept in the data collection then the researchers could have done the analysis with and without them, providing empirical evidence as to the importance of these cases. At the very least, it would have been helpful to know how the cases that were dropped compare to the cases that were retained. I assume that there are important differences, as indicated by the mothers’ interest in caring for their children at home. [33]

[…] Further, as noted earlier, most reports indicate that 111 children were randomly assigned, with 57 assigned to the program group and 54 assigned to the control group. Adding back the dropped cases (discussed earlier) resulted in eight more cases in the program group (or sixty-five children) and three more in the control group (or fifty-seven children). The two siblings in the program group were automatically added to this group, suggesting that sixty-three families were assigned to the program group and fifty-seven to the control group. Yet, because children were paired and then randomly assigned, there should have been sixty families in each group; the fact that there were not suggests something else may have been amiss that was not reported.

Ramey & Campbell replied to Besharov in the commentary, “Selection bias would result if parents who refused treatment differed from those who remained in the program in ways that interacted with treatment to affect children’s outcomes. If this were the case, significant differences between parents who refused to participate and those who accepted their assignment would be found, and treatment would significantly interact with the relevant parental characteristic, thus causing a differential effect of treatment on child outcomes. We have not found systematic differences between refusing parents and other parents. [1] Neither have we found differences by family characteristic interactions. [2] There is no indication that initial bias artificially inflated the treatment effect.” and with mention of their earlier studies such as Martin et al. (1990) and Campbell & Ramey (1994, 1995). After reading them, I was unable to find the relevant answer. All of their analysis have been done on the 111 children, and no analysis has been performed with the inclusion of the 8 families who refused their assignment.

Campbell & Burchinal (2008) performed another analysis of interest. The studied sample was about 50 in the treatment group and about 50 in the control group, with total N about 100. They were given the Stanford-Binet (at ages 2, 3 and 4 years) Wechsler’s scale (beginning at age 5 and after) for indexing their intellectual development but also the McCarthy Verbal development given at 30, 42, and 54 months of age. The statistical analyses involve regression with interaction term effects, with models described below.

The hierarchical longitudinal analyses considered only preschool treatment and participant age in the first model; model 2 added child gender, the home environment, maternal IQ, maternal attitudes, and the presence of a father-figure in the home; model 3 added the first mediator considered, child task orientation; and model 4 added the second, child verbal development.

We see, in Campbell’s regression, that the main effect of mother’s IQ is reduced from 0.34 to 0.10, given models 3 to 4, where children McCarthy Verbal development is added as controlled parameter. Thus, holding constant verbal development, mother’s IQ explains a little less than 30% of the treatment effect.

Campbell & Burchinal (2008, Table 4.5)

Also, the interaction of either HOME*age and HOME*age^2, or maternal_IQ*age and maternal_IQ*age^2, is close to zero. That means the cumulative effect of HOME over time is null, i.e., the effect does not increase or decrease over time. There is a gender*age interaction however, with initial female advantage. And this can be interpreted as to say that the gender gap is closing over the ages, at the disadvantage of females. Parental attitude and being married have both negative Beta regressions, meaning that their effect on children’s IQ tends to decline with children’s age.

Brody (2008, p. 74) indicates an interesting pattern in their data. As shown in their Table 4.6, the model 1 in which treatment effect and age are included, the d gap at age 2 amounts to 0.96, at age 6.5 the d was 0.38, at age 12 the d was 0.31, at age 21 the d was 0.19 (IQ=2.85). This pattern (similar in all models, except model 4) suggests longitudinal fade out. But now, the model 4 which partials out verbal development has the effect of decreasing the effect size. At age 2, the gap decline from 1.07 to 0.76 after partialing out the McCarthy verbal development signifies that most of the IQ advantage is on the nonverbal IQ. At age 6.5, the initial 0.40 d gap falls to 0.10, at the advantage of the treatment group. This indicates that the remaining effect size, of 0.10, must be attributed to nonverbal IQ, with verbal IQ explaining most of the enduring IQ gains. But at age 12 and 21, the effect size becomes negative only in model 4, as we can see.

Campbell & Burchinal (2008, Table 4.6)

Campbell & Burchinal (2008, Figure 4.2)

That suggests, counter-intuitively, that the treatment group has gained verbal IQ over the control group but at the expense of a (declining) nonverbal IQ compared to the control group; i.e., the treatment ended up with greater verbal IQ but lower nonverbal IQ than the control group. If this is the case, no one would be able to conclude that the ABC enhances IQ. Otherwise, the Beta weight value must be and remain a non-zero positive “d” effect size at all ages, to reflect advantage in nonverbal IQ. Brody’s comment reads as follows :

The analysis of effect sizes of the intervention on intelligence reported in Table 4.6 indicates that the effects of the intervention decline with age. At age 21, the effect size of the intervention based on the model 1 analysis controlling for pre-school treatment group and age is .19. The data reported in Table 6 for Model 4 control for measures of verbal ability obtained at 30, 42 and 54 months. Professor Campbell notes that an aggregate verbal ability score based on these three assessments mediates the effects of the experimental treatment. The effects of the intervention become negative after controlling for pre-school verbal ability. Note that the negative effect size at age 21 for Model 4 in Table 4.6 (−.38) is actually twice the effect size for the experimental intervention reported for Model 1. These results may be explained by assuming that the gains in early intellectual development had two components – one is a gain in the hypothetical construct that is putatively assessed by tests of ability. The second component is a gain in test score for subjects in the intervention group that is relatively independent of the underlying disposition that is assessed by the test. Since the control groups scores are not artificially inflated by the latter component, the control groups scores are closer to the hypothetical “true” score of the underlying disposition. If the scores of the experimental group are artificially inflated they will be less predictive of performance over time and will predict higher adult scores on intelligence tests than the same score obtained from a subject in the control group. The declining effect sizes for the intervention observed for Model 1 over time are mirrored by the increasingly negative effect sizes reported for Model 4 that controls for early verbal ability. These results imply that the inflated test scores associated with changes in scores on tests of intelligence for the intervention group that are independent of changes in the hypothetical true score value of the intelligence disposition fade over time. A follow-up for the experimental sample at age 30 based on the Model 1 analysis might well indicate that the effect size for the intervention had declined to zero. These data indicate that the experimental intervention may very well have had a minimal enduring influence on intelligence construed as a latent disposition.

Another interpretation, offered by Brody, was that earlier IQs and later IQs don’t share a common factor, which may suggest that the gain was not measurement invariant, because at earlier ages, verbal development predicts (and accounts for) some part of the IQ gain, but not at later age where the IQ gain is reversed. g itself has probably not been stimulated. Even so, Campbell reports that the ABC has greatly enhanced the educational attainment and achievement scores overall, consistent with all other interventions of this kind. That proves the benefits of such measures work through pathways that are independent of IQ. This is not surprising since IQ and SES are not perfectly correlated. But the most important question is whether or not the SES gain can be passed on to the children. Some data reported by the Economic Mobility Project (EMP, 2007, Figure 5) indicate that healthy black families can’t protect their children from regressing towards lower SES levels than white children, which can be due to IQ regression to the mean effects, although the samples were small for the highest SES blacks. The question of familial transmission is crucial here. If high-IQ black families fail to pass on their advantages on their children, this large-scale enterprise will be for naught.

If, overall, the past experiences have been unsuccessful to raise IQ, the common defense is that not enough money, resources and investment have been deployed. The relevant question, then, would be : how much will be enough ?

The general finding of null long-term effect of educational interventions can be compared with the failure of sustained impact of adoption studies on IQ, as summarized by Spitz (1986, pp. 59-84). For example, Schmidt (1946) study was fraudulent or, according to Spitz, “a figment of her imagination” (p. 80). And Kirk (1958) study was inconclusive. There was also a well-known study, Skeels & Dye (1939) in which a pool of retarded orphans is selected and placed into an institution while the remaining stayed in orphanage. The gain was tremedous but the study is weakened by failure to randomize samples since the experimental groups have lower IQs, which artificially inflates effect sizes due to regression to the mean effect that manifests itself when the experimental group have lower IQ. Less than perfect reliability causes lower IQ children to improve their score at retesting whereas higher IQ children perform relatively worse at retesting, which is especially true when the subjects were young (e.g. 2-3 years). The largest IQ rise and IQ drop seems to occur on the first retesting. The Skodak & Skeels (1949) adoption study was also widely cited, but Flynn (1993) raises a huge problem. The IQs have been affected by obsolescence effect (due to Flynn effects) and rather than the 20-point increase reported by Skodak & Skeels, Flynn estimates an IQ gain of about 10.8 or 12.8 between adopted child and biological mother. Furthermore, as noted by Spitz, selective placement was present, because the foster home quality correlated with the IQ of biological mothers. Regression effects also occurred in this study. Spitz (1986, p. 71), Jensen (1973, p. 241), and Locurto (1990, pp. 278-280) noted indeed that the healthier babies were more likely to be adopted. Interestingly, Spitz (1986, p. 74) affirmed that the IQs of adoptees, when they mature, became increasingly correlated with either the IQ or the education of their biological mother’s whereas this correlation remains close to zero with the education of the foster mother. Another cited study was that of Schiff (1978, 1982) but has also serious problems noted by Locurto (1990, pp. 283-287) because the reports appear highly fraudulent. Concerning the Capron & Duyme (1989, 1996) study, the gain was not related with g-loadings. In general, there is no good evidence that adoption would raise IQ substantially (Jensen, 1998, pp. 476-477). Such outcome is not surprising given the apparent weak familial transmission among adoptive families (e.g., Scarr & Weinberg, 1978; Plomin et al., 1997).

Of course, given Jensen (1981) statement “My hunch is that the nongenetic variance in IQ is the result of such a myriad of microenvironmental events as to make it extremely difficult, if not impossible, to bring more than a small fraction of these influences under experimental control.” (p. 33), if true, we can suspect that those experimental studies won’t be able to put in motion all of the possible environmentalities, far from it. But providing education, medical care, and familial environment is almost all that can be achieved through public interventions. Not much can be achieved over this that has not been attempted already in the past.

Needless to say, this result is much of a problem for what is actually the most popular environmental theory, the so-called gene-environment correlation, or G-E correlation (Dickens & Flynn, 2001; van der Maas et al., 2006; Kan, 2011) for which, anyway, Jensen had already dealt with (1976, 1998, pp. 453, 522). What these theories have in common is the assumption that we can build g from the outside, by stimulating, providing specific resources beneficial to the children’s intellectual growth. For these theories to be correct, attempts to rise g must at least produce durable g gains. That suggests g as a causal entity, not emergent entity.


Barnett W. Steven (1995). Long-Term Effects of Early Childhood Programs on Cognitive and School Outcomes.
Baumeister Alfred A., & Bacharach Verne R. (1996). A Critical Analysis of the Infant Health and Development Program.
Besharov Douglas J., Germanis Peter, Higney Caeli A., and Call Douglas M. (2011). Assessing the Evaluations of Early Childhood Education Programs. All chapters, 2-27, at Welfare Reform Academy.
Besharov Douglas J., Germanis Peter, Higney Caeli A., and Call Douglas M. (2011). Chapter 2. The Abecedarian Project.
Besharov Douglas J., Germanis Peter, Higney Caeli A., and Call Douglas M. (2011). Chapter 9. Early Training Project.
Besharov Douglas J., Germanis Peter, Higney Caeli A., and Call Douglas M. (2011). Chapter 16. The High/Scope Perry Preschool Project.
Besharov Douglas J., Germanis Peter, Higney Caeli A., and Call Douglas M. (2011). Chapter 18. Infant Health and Development Program.
Brody Nathan (2008). Does Education Influence Intelligence?, Chapter 5, in Extending Intelligence: Enhancement and New Constructs (2008) by Kyllonen Patrick C., Roberts Richard D., Stankov Lazar.
Burchinal Margaret R., Campbell Frances A., Bryant Donna M., Wasik Barbara H., Ramey Craig T. (1997). Early Intervention and Mediating Processes in Cognitive Performance of Children of Low-Income African American Families.
Camilli Gregory, Vargas Sadako, Ryan Sharon, Barnett W. Steven (2010). Meta-Analysis of the Effects of Early Education Interventions on Cognitive and Social Development.
Campbell Frances A., Burchinal Margaret R. (2008). Early Childhood Interventions: The Abecedarian Project, Chapter 4, in Extending Intelligence: Enhancement and New Constructs (2008) by Kyllonen Patrick C., Roberts Richard D., Stankov Lazar.
Campbell Frances A., & Ramey Craig T. (1990). The Relationship Between Piagetian Cognitive Development, Mental Test Performance, and Academic Achievement in High-Risk Students With and Without Early Educational Experience.
Currie Janet, & Thomas Duncan (1995). Does Head Start Make a Difference?.
Currie Janet, & Thomas Duncan (1998). School quality and the longer-term effects of Head Start.
Ferguson Ronald F. (1998). Can Schools Narrow the Black-White Test Score Gap?, Chapter 9, in The Black-White Test Score Gap (1998) by Jencks Christopher & Phillips Meredith.
Flynn James R. (1993). Skodak and Skeels: The Inflated Mother-Child IQ Gap.
Jensen Arthur R. (1969). How Much Can We Boost IQ and Scholastic Achievement.
Jensen Arthur R. (1976). The problem of genotype-environment correlation in the estimation of heritability from monozygotic and dizygotic twins.
Jensen Arthur R. (1981). Raising the IQ: The Ramey and Haskins study.
Leak et al. (2010). Is Timing Everything? How Early Childhood Education Program Impacts Vary by Starting Age, Program Duration and Time Since the End of the Program.
Locurto Charles (1990). The Malleability of IQ as Judged From Adoption Studies.
McCall Robert B. (1981). Nature-Nurture and the Two Realms of Development: A Proposed Integration with Respect to Mental Development.
McKey et al. (1985). The Impact of Head Start on Children, Families and Communities. Final Report of the Head Start Evaluation, Synthesis and Utilization Project.
McKey et al. (1985). The Impact of Head Start on Children, Families and Communities. Final Report of the Head Start Evaluation, Synthesis and Utilization Project. Executive Summary.
Ramey Craig T. (1992). High-risk children and IQ: Altering intergenerational patterns.
Ramey Craig T., & Haskins Ron, (1981). Early education, intellectual development, and school performance – A reply to Arthur Jensen and J. McVicker Hunt.
Ramey Craig T., Yeates Keith Owen, Short Elizabeth J. (1984). The Plasticity of Intellectual Development: Insights from Preventive Intervention.
Ramey Craig T., Campbell Frances A., Burchinal Margaret, Skinner Martie L., Gardner David M., Ramey Sharon L. (2000). Persistent Effects of Early Childhood Education on High-Risk Children and Their Mothers.
Spitz Herman H. (1986). The Raising of Intelligence: A Selected History of Attempts To Raise Retarded Intelligence.
Spitz Herman H. (1991). Commentary on Locurto’s “Beyond IQ in Preschool Programs?”.
Spitz Herman H. (1992). Does the Carolina Abecedarian Early Intervention Project Prevent Sociocultural Mental Retardation?.
Spitz Herman H. (1993). When prophecy fails: On Ramey’s response to Spitz’s critique of the Abecedarian project.

*The public data of the Abecedarian and CARE project are available here. Alternatively, I uploaded the data on EXCEL, here. Note that the data is, of course, incomplete.*


  1. Anthony

    one would need to show that inferior schools reduce the difference between higher and lower achieving siblings by depressing high achievement disproportionately

    This hypothesis is not absurd on its face, though I would not be surprised to find well-controlled research confirming it or its opposite, as I can conceive of various causal mechanisms for such a difference.

  2. statsquatch668 (@statsquatch668)

    Nice summary. Another interesting paper from a statistical perspective is by Anderson:

    “Multiple Inference and Gender Differences in the
    Effects of Early Intervention: A Reevaluation of the
    Abecedarian, Perry Preschool, and Early Training


    His main thesis is that if you correct for multiple comparisons either by controlling FWER or FDR,in a reasonable way, there are not many gains at all unless you look for a gender effect. In this case girls may have some improvement on some measures.

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2024 Human Varieties

Theme by Anders NorenUp ↑