87 thoughts on “Is Psychometric g a Myth?

  1. Dalliard said
    Shalizi’s thesis is that the positive manifold is an artifact of test construction and that full-scale scores from different IQ batteries correlate only because they are designed to do that. It follows from this argument that if a test maker decided to disregard the g factor and construct a battery for assessing several independent abilities, the result would be a test with many zero or negative correlations among its subtests.

    Forgive me if I’m missing something hear, but wouldn’t Spearman’s original work on the g factor already refute this? Presumably the early intelligence tests weren’t made with the positive manifold in mind as it was yet to be discovered, yet Spearman was able to deduce a general factor of intelligence from these tests anyway.

    • It’s possible that the early intelligence tests did not tap into all cognitive abilities, or that Spearman and other “g men” included in their studies a limited variety of tests, thus guaranteeing the appearance of the positive manifold. However, as I showed above, researchers who have specifically attempted to create tests of uncorrelated abilities have failed, ending up with tests that are not substantially less g-saturated than those made with g in mind.

  2. ***continuous progress is being made in understanding g in terms of neurobiology (e.g., Lee et al. 2012, Penke et al. 2012, Kievit et al. 2012) and molecular genetics ***

    I think Steve Hsu pointed out, anyone who understands factor analysis realises that you can have correlations and a single largest factor even if there are no underlying causal reasons (i.e., it is just an accident). Nonetheless, these models may still be useful.

    Prior to the availability of molecular studies the heritability of type II diabetes was estimated at 0.25 using all those methods. Now molecular studies have identified at least 9 loci involved in the disease. There are other examples in relation to height.

  3. I thought you might have mentioned Gardner a little more. He never actually turned his theory into something testable so 3 researchers tested his intelligences and found intercorrelations and correlations with g.

    If anyone’s interested, the exchange went like this:
    Visser et. al (2006). Beyond g: Putting multiple intelligences theory to the test

    Gardner (2006). On failing to grasp the core of MI theory: A response to Visser et al.

    Visser et. al (2006). g and the measurement of Multiple Intelligences: A response to Gardner

    • Yeah, Gardner’s is another one of those failed non-g theories. I’ve read the Visser et al. articles, but Gardner’s theory is really a non-starter because many of his supposedly uncorrelated intelligences are well-known to be correlated, and he does not even try to refute this empirically. Privately, Gardner also admits that relatively high general intelligence is needed for his multiple intelligences to be really operative. Jensen noted this in The g Factor, p. 128:

      As exemplars of each of these “intelligences” Gardner mentions the following famous persons: T. S. Eliot (linguistic), Einstein (logical-mathematical), Picasso (spatial), Stravinsky (musical), Martha Graham (bodily-kinesthetic), Sigmund Freud (intrapersonal), and Mahatma Gandhi (interpersonal). In an interesting book Gardner gives biographical analyses of each of these famous creative geniuses to illustrate his theory of multiple “intelligences” and of the psychological and developmental aspects of socially recognized creativity. When I personally asked Gardner for his estimate of the lowest IQ one could possibly have and be included in a list of names such as this, he said, “About 120.” This would of course exclude 90 percent of the general population, and it testifies to the threshold nature of g. That is, a fairly high level of g is a necessary but not sufficient condition for achievement of socially significant creativity.

      • Gardner admitted to me in an email exchange that the existence of multiple intelligences made the existence of racial inequality in intelligence more likely. If only one number is relevant, then it’s not that improbable in the abstract that all races could average the same number, just as men and women are pretty similar in overall IQ. But, if seven or eight forms of intelligence are highly important, the odds that all races are the same on all seven or eight is highly unlikely. Gardner agreed.

  4. Here’s an example Shalizi uses that’s worth thinking about because it actually unravels his argument:

    “One of the examples in my data-mining class is to take a ten-dimensional data set about the attributes of different models of cars, and boil it down to two factors which, together, describe 83 percent of the variance across automobiles. [6] The leading factor, the automotive equivalent of g, is positively correlated with everything (price, engine size, passengers, length, wheelbase, weight, width, horsepower, turning radius) except gas mileage. It basically says whether the car is bigger or smaller than average. The second factor, which I picked to be uncorrelated with the first, is most positively correlated with price and horsepower, and negatively with the number of passengers — the sports-car/mini-van axis.

    “In this case, the analysis makes up some variables which aren’t too implausible-sounding, given our background knowledge. Mathematically, however, the first factor is just a weighted sum of the traits, with big positive weights on most variables and a negative weight on gas mileage. That we can make verbal sense of it is, to use a technical term, pure gravy. Really it’s all just about redescribing the data.”

    Actually, I find his factor analysis quite useful. If he simply entered “price” as a negative number, he’d notice that his first factor was essentially Affordability v. Luxury, in which various desirable traits (horsepower, size, etc.) are traded off against price and MPG.

    What’s really interesting and non-trivial about the g-factor theory is that cognitive traits aren’t being traded off the way affordability and luxury are traded off among cars. People who are above average at reading are, typically, also above average on math. That is not something that you would necessarily guess ahead of time. (Presumably, the tradeoff costs for higher g involve things like more difficult births, greater nutrition, poorer balance, more discrete mating, longer immature periods, more investment required in offspring, and so forth.)

  5. I think the essence of Shalizi’s mistake is conveniently summed up in his first sentence:

    “Attention Conservation Notice: About 11,000 words on the triviality of finding that positively correlated variables are all correlated with a linear combination of each other, and why this becomes no more profound when the variables are scores on intelligence tests.”

    This reminds me of the old joke about the starving economist on the desert island who finds a can of beans: “Assume we have a can opener …”

    Shalizi just assumes that all cognitive traits are positively correlated, and then goes on from there with his argument. But the fact that virtually all cognitive traits are positively correlated is astonishing.

    Most things in this world involve tradeoffs. Think about automotive engineering. More of one thing (e.g., luxury) means less of another thing (e.g., money left over in your bank account).

    Look at Shalizi’s example of ten traits regarding automobiles. In terms of desirability, some are positively correlated, some are negatively correlated:

    Positive or neutral: passengers, length, wheelbase, weight, width, horsepower, engine size

    Negative: price, turning radius, average fuel cost per 15,000 miles (i.e., MPG restated)

    The fact that, on average, there aren’t tradeoffs between cognitive traits is highly nontrivial.

    • Intellectuals may be prone to being skeptical of g because most people they associate with are high on g, which makes specific abilities more salient. For example, in his heritability book, Neven Sesardic gives the following, remarkably wrong-headed quote from the British philosopher Gilbert Ryle:

      “Only occasionally is there even a weak inference from a person’s possession of a high degree of one species of intelligence to his possession of a high degree of another.”

      (It’s from a 1974 article called Intelligence and the Logic of the Nature–Nurture Issue.)

  6. Okay, but am I overlooking something in saying that the root problem with Shalizi’s argument, in which he makes up numbers that are all positively related to each other and shows that you often see a high general factor even with random numbers, is that this “positive manifold” in which practically all cognitive tasks are positively correlated is pretty remarkable, since we don’t see the kind of trade-offs that we expect in engineering problems?

  7. Dalliard, you write very well!

    Even though, as Steve Sailer says, it is striking that there are no obvious tradeoffs between the needs of different tasks, we are still left with another question about possible tradeoffs: Why so much variability in g? Is there a Darwinian downside to having too much little g? Is the dumb brute greater in reproductive fitness for some reason? If so, what reason? One can imagine lots of scenarios–is there any way to test them, I wonder?

    I guess if it turns out that little g variability reflects mutation load, then there is no need to postulate a tradeoff?

    • Thanks. I don’t have a good answer with regard to variability. Mutation load would make the most sense, but it may not be the whole story. It’s easy to come up with hypothetical scenarios, as you say. Of course, this is a problem with heritable quantitative traits in general. What I like about heritability analysis is that you don’t really need to worry about the ultimate causes of genetic variation.

  8. May I suggest adding a table of contents at the top with internal links to the sections numbered with roman numerals?

  9. Another branch of psychometrics, personality testing, tends to use a five factor model. To what extent can we say those factors are simply what is found “in the data” vs created by psychometricians?

    Also, the g factor is first referred to as accounting for 59 percent of common factor variance, and later said to account for less than half the variance in an IQ battery. Is that because of the contribution of non-common factors to the variance?

    • I’d say that the Big Five are much less real than g. There’s a good recent paper that compares the Big Five and their facets (sub-traits). They found that most Big Five traits are not “genetically crisp” because genetic effects on the facets are often independent of the genetic effects on the corresponding Big Five traits. Moreover, if you use a Big Five trait to predict something, you will probably forgo substantial validity if you don’t analyze data at the facet level, whereas with IQ tests little is gained by going beyond the full-scale score in most cases.

      In factor analysis, the total variance is due to common factor variance, test-specific variance, and error variance. (There are no non-common factors, because factors are by definition common to at least two subtests.) g usually accounts for less than 50 percent of the total variance but more than 50 percent of the common factor variance.

      • So, we can approximate that the g glass is about half full and half empty simultaneously.

        I think human beings have problems thinking about things where the glass is both half full and half empty. Yet, we seem to be most interested in arguing about situations that are roughly 50-50.

  10. Pingback: Facets of the Big Five personality factors | Entitled to an Opinion

  11. A guess as to part of what’s happening here:

    It stands to reason that multiple areas of the brain are recruited as part of cognition. This makes the “sampling model” intuitively appealing, while making g intuitively difficult to understand as a causal mechanism. However, the question of how cognition works and the question of what underlies individual differences in cognition are two quite separate questions.

    The model which seems to fit the data presented here is that g ultimately reflects a collection of features of neuronal cell physiology as well as the physiology of higher-level parts of neuroanatomy that vary between individuals. Genetic effects on cell physiology and brain development tend to have brain-wide impacts, which get reflected in g. In contrast, one might imagine that various non-genetic effects would have more localized impacts on the brain and thus more variegated effects on variation in cognitive abilities. This causes the heritability of g to come close to 100%, while the heritability of composite IQ scores can be much less.

  12. (1) Very nice essay. I know I should reread it, and Shalizi. Shalizi’s essay is better than you make out. This isn’t because it says useful things about IQ, I think, but because it says useful things about factor analysis. Where he goes wrong seems to be in thinking that the deficiencies of factor analysis destroy the concept of g.

    (2) Can you write something on the Big Five? I know psychologists like it better than Myers-Briggs, but the main reason seems to be because they like factor analysis. I can see that they may have found the 5 most important factors, and maybe there is a big dropoff to going to six, but I wonder if they can really label the 5 meaningfully (what does Neuroticism really mean?). The nice thing about Myers Briggs is that people see the results and say, “Oh, yes, I see that from my experience,” just as with IQ people say say, “Of course, some people are smarter than others, it’s just common sense that there exists something we call intelligence.”

    (3) Is the multiple-intelligences theory, and in particular Shalizi’s example of 100s of independent abilities, really just saying that we call somebody smart if they are high in their sum of high abilities rather than just being high in one ability? Is there a real difference between the two things? (that’s a serious question)

    • A problem with the Big Five is indeed that it relies so heavily on (exploratory) factor analysis (whereas g theory is based on a wide range of evidence aside from factor analysis). See also the article I linked to above in reply to teageegeepea.

      The problem with Myers-Briggs is that it lacks predictive validity, i.e., it does not seem to tell anything important about people.

      Is the multiple-intelligences theory, and in particular Shalizi’s example of 100s of independent abilities, really just saying that we call somebody smart if they are high in their sum of high abilities rather than just being high in one ability? Is there a real difference between the two things? (that’s a serious question)

      Yes, that’s the basic idea. Shalizi’s argument is that it’s arbitrary to use this sum of abilities, while my argument is that this supposed sum of abilities looks suspiciously like one single ability or capacity which represents the most important, and often the only important, dimension of cognitive differences.

  13. Pingback: linkfest – 04/07/13 | hbd* chick

  14. I believe the above critique doesn’t hit the mark, at least as regards ‘errors’ 1 and 3 (the reference to work on confirming factor analysis is much more direct).

    Part 1

    Shalizi: this hypothesis is not falsifiable, and here is a simulation experiment that demonstrates that fact.
    Dalliard: here are lots of studies showing the hypothesis is true.

    Part 3

    @Dalliard: I believe you are mistaking the simulation experiment and its role as a ‘null hypothesis’ in the overall framework of Shalizi’s article, with something else you know all about. He is not advocating the sampling model (and in fact is using random numbers) in his simulation experiment. This section is entirely a stawman argument.

    There is very little evidence in the above blog to show that Dalliard has understood and engaged Shalizi’s argument.

    A more reasonable conclusion would be that the article, written in 2007, is now dated. Whether it was valid in 2007 depends a lot on your assessment of Jensen’s 1998 work — both those topics would make for very constructive further explanation, I think.

    • Your summary of Part 1 is inaccurate.

      Shalizi claims that intelligence tests are made to positively correlate with each other.

      Dalliard counters by arguing that if a test maker decided to ignore g, it would still pop up in any test he made because the positive correlations are not constructs of tests, but an empirical reality. He then cites evidence that supports his argument.

      Shalizi’s simulation, therefore, is nothing more than a GIGO model showing that randomly positive correlations also demonstrate a general factor similar to what is found in IQ tests. This is true, but uninteresting; it still doesn’t explain how the uniformity of positive correlations in tests exist in the first place.

      • You’re right as far as “Part 1” is concerned, but just to be clear, the abilities in Shalizi’s toy model are genuinely uncorrelated. Correlations between tests emerge because all of them call on some of the same abilities, and g corresponds to average individual differences across those shared abilities.

      • @Dalliard 7:03 AM,

        “You’re right as far as “Part 1″ is concerned, but just to be clear, the abilities in Shalizi’s toy model are genuinely uncorrelated.”

        Shalizi is pretty clear that the seemingly random variables in his simulation are supposed to be positively correlated – i.e., they’re not really random at all. That simulation would not work at showing a g factor if those random factors were genuinely uncorrelated.

        Shalizi writes:

        “If I take any group of variables which are positively correlated, there will, as a matter of algebraic necessity, be a single dominant general factor, which describes more of the variance than any other, and all of them will be “positively loaded” on this factor, i.e., positively correlated with it. Similarly, if you do hierarchical factor analysis, you will always be able to find a single higher-order factor which loads positively onto the lower-order factors and, through them, the actual observables. What psychologists sometimes call the “positive manifold” condition is enough, in and of itself, to guarantee that there will appear to be a general factor. Since intelligence tests are made to correlate with each other, it follows trivially that there must appear to be a general factor of intelligence. This is true whether or not there really is a single variable which explains test scores or not.”

        Everything in Shalizi’s argument in the above quote hinges on his assumption that IQ tests, and the various subtests within them, are meant to be positively correlated with each other. His simulation works from that assumption, which as you point out is an incorrect assumption.

        Shalizi later writes: “If I take an arbitrary set of positive correlations, provided there are not too many variables and the individual correlations are not too weak, then the apparent general factor will, typically, seem to describe a large chunk of the variance in the individual scores.”

        So Shalizi starts off by assuming a g factor in his simulation and then wonders why psychologists are so impressed with finding a g factor in their tests.

        The answer is, of course, that there is no earthly reason why psychologists should have necessarily found a g factor in their tests. The abilities measured in them – unlike Shalizi’s simulation – could have very well been uncorrelated or even negatively correlated.

      • In Shalizi’s model, the abilities are based on random numbers and are therefore (approximately) uncorrelated, while the tests are positively correlated. Each test taps into many abilities, and correlations between tests are due to overlap between the abilities that the tests call on. If each test in Shalizi’s model called on just one ability or on non-overlapping samples of abilities, then the tests would also be uncorrelated.

      • @Dalliard 1:59 PM,

        “In Shalizi’s model, the abilities are based on random numbers and are therefore (approximately) uncorrelated, while the tests are positively correlated. Each test taps into many abilities, and correlations between tests are due to overlap between the abilities that the tests call on. If each test in Shalizi’s model called on just one ability or on non-overlapping samples of abilities, then the tests would also be uncorrelated.”

        Thanks for the clarification.

        So is Shalizi’s error in not realizing that a person’s g is fairly consistent when measured and compared across several IQ tests? That’s how I read this passage you wrote to Macrobius:

        “Various kinds of evidence have been proffered in support of the notion that the same g is measured by all diverse IQ batteries, but the best evidence comes from confirmatory factor analyses showing that g factors are statistically invariant across batteries. This, of course, directly contradicts the predictions of g critics like Thurstone, Horn, and Schonemann.”

        I assume this evidence contradicts the critics because random numbers – similar to those in Shalizi’s simulation – would not produce a consistent g across several batteries. Is that correct?

      • The random numbers aren’t that important, they’re just a way to introduce individual differences to the model. Shalizi’s mistake is to think that the fact that correlations between tests can be generated by a model without a unitary general factor has any serious implications for the reality of g. Any sampling model must be capable of explaining the known facts about g, including its invariance across batteries, which means that sampling, if real, is just about explaining the operation of a unidimensional g at a lower level of analysis.

    • Shalizi: this hypothesis is not falsifiable, and here is a simulation experiment that demonstrates that fact.
      Dalliard: here are lots of studies showing the hypothesis is true.

      As Pincher Martin pointed out, the simulation experiment is not related to Shalizi’s first error. The first error is the assertion that there are cognitive tests that are uncorrelated or negatively correlated with tests included in traditional IQ batteries. There is no evidence that this is the case, and there are tons of evidence to the contrary (perhaps the closest is face recognition ability, which is relatively independent, but even it has a small g loading in studies I’ve seen). It’s conceivable that there are “black swan” tests of abilities that do not fit the pattern of positive correlations, but even then it’s clear that a very wide range of cognitive abilities, including all that our educational institutions regard as important, are positively correlated.

      I believe you are mistaking the simulation experiment and its role as a ‘null hypothesis’ in the overall framework of Shalizi’s article, with something else you know all about. He is not advocating the sampling model (and in fact is using random numbers) in his simulation experiment. This section is entirely a stawman argument.

      Nope. The simulation represents an extreme version of sampling, and Shalizi doesn’t claim that it’s a realistic model, but he nevertheless thinks that g is most likely explained by the recruitment of many different neural elements for the same intellectual task, with some of these elements overlapping across different tasks.

      This is how he puts it: “[T]here are lots of mental modules, which are highly specialized in their information-processing, and that almost any meaningful task calls on many of them, their pattern of interaction shifting from task to task.” My counter-point is that even if sampling is true, it does not invalidate g. Any model of intelligence must account for the empirical facts about g, which in the case of sampling means that there must be a hierarchy of intelligence-related neural elements, some of them central and others much less important, with g corresponding to the former.

      There is very little evidence in the above blog to show that Dalliard has understood and engaged Shalizi’s argument.

      I understand Shalizi’s argument, whereas most people who regard it as a cogent refutation of general intelligence do not. I also engage his argument at length.

      A more reasonable conclusion would be that the article, written in 2007, is now dated. Whether it was valid in 2007 depends a lot on your assessment of Jensen’s 1998 work — both those topics would make for very constructive further explanation, I think.

      My post could as well have been written in 2007. I cited some more recent studies, but they are not central to my argument. All the relevant evidence was available to Shalizi in 2007, but he didn’t know about it or decided to ignore it.

  15. I thank both @Dalliard and @Pincher Martin for their incisive replies that helped me understand what is being claimed, esp. as regards to ‘error 1’.

    I do indeed see evidence that Shalizi believes as you say. I will argue, however, that does not harm his argument in the way claimed. Before I do that, though, allow me to comment about what I take to be his point in the post. Unfortunately, most of his substantive points are actually in the footnotes. I take Shalizi to be largely recapitulating the paper of Shoenemann he references in n.2 (‘Factorial Definitions of Intelligence: Dubious Legacy of Data Analysis’). My evidence is that hardly anything he says not in that paper, in greater detail, and the tone of the polemic and its aim is quite similar. In fact, I would describe the post as a pedagogical exposition of Shoenemann’s views — with *one* extension.

    Allow me to explain: Shoenemann is quite clear he regards Spearman’s g and related factor analysis, and Jensen’s definition of g in terms of PCA, to be changing the definition of g on the fly. In this context, he recapitulates that history of Spearman’s g and Thurstone’s views, giving all the critiques that Shalizi raises, then shifts gears and gives his opinion of Jensen.

    One of the problems with Shalizi’s post is he minimizes the transition from Factor Analysis to PCA, which Shoenemann treats as models having very different properties and hence critiques. None the less, it is clear that Shalizi is influenced by Shoenemann, and besides his primary exposition is trying to recapitulate Thomson’s construction (pp. 334-6, op. cit.).

    Specifically, with regard to ‘error 1’ I think what Shalizi is trying to do is precisely replicate this passage of Thomson, only for Jensen’s PCA based g:

    ‘Hierarchical order [i.e. ideal rank one] will arise among correlation coefficients unless we take pains to suppress it. It does not point to the presence of a general factor, nor can it be made the touchstone for any particular form of hypothesis, for it occurs even if we make only the negative assumption that *we do not know* how the correlations are caused, if we assume only that the connexions are random’

    This passage is immediately followed by a mathematical analogy along the lines of Shalizi’s squares and primes, though different. The honest thing to do is to ask Shalizi at this point if he had this passage in mind, when he devised his toy simulation, though I don’t doubt the answer myself.

    One reason to pay attention to this background, from a polemic standpoint, is that even if you ‘take down Shalizi’ and neutralize his post, you leave yourself open to a very simple rejoinder: that Shalizi was just a flawed version of Shoenemann, and what about that? However, I don’t think we’ve yet reached the point we can say Shalizi is flawed, and I will explain that next.

    Let’s start with the basic facts of how Frequentist inference works: you have an unrestricted model (H1, estimated by your data), a restricted model (the null hypothesis, usually estimated by some assumptions the restrictions affords you), and a ‘metric’ — say Wald distance, Likelihood Ratio, or Lagrange Multiplier. For definite, since it is most appropriate to this context, let’s take Likelihood Ratio. Next, one notes that the likelihood of the *restricted* model is always less than the unrestricted, so that the LR is bounded 0 <= LR < 1. That is, you *must* put the restricted model in the numerator — if you do not, then you don't get a compelling inference. The restricted model is always *trivially* less likely than the unrestricted, so you are refuted if the trivially less likely model ends up being more likely than H1 given your data. This is what gives Frequentist inference it's force — the fact that it can be trivially falsified, in case it happens to be uninformative.

    So here's the puzzler: Shalizi is a statistician, and yet he's chosen to randomize the *unrestricted model*. Did he just make a howling blunder? If so, Dalliard is straining at a speck of an error here, when he should be putting a beam through Shalizi's eye (and maybe Thomson's as well).

    Of course, I believe Shalizi has done no such thing, being a competent statistician — beyond perhaps making the form of his inference here explicit. If my hunch that he is following Thomson's logic exactly turns out to be correct, then what he must be doing is some sort of dominance argument, by constructing a likelihood that is *greater* than the 'positive manifold' of the restricted model. It would be really productive if someone involved in this spat were to spell out the form of inference — if any — Shalizi and Thomson are trying to use! Because it certainly doesn't follow the *normal* template of Frequentist reasoning, people may be assuming. It's bass ackwards.

    Now, does 'error 1' have any force? I don't think so, even if Shalizi holds the proposition and is wrong about it. Nothing hinges, in the form of argument — assuming again it is not just sheer blunder, which I doubt — on the question of whether the restricted model is enforced by empirics or by design. Frequentist inference is about correlation, and just doesn't give a damn about that sort of thing. I may be in error here, and would be happy to have my error explained to me.

    For @Dalliard, a simple question: do you believe Thomson's argument was persuasive against the Two Factor Model? And secondly, if you believe it was, do you believe a similar argument could succeed in principle against Jensen's PCA version of g? If not, why not?

    • I do indeed see evidence that Shalizi believes as you say. I will argue, however, that does not harm his argument in the way claimed.

      What I termed Shalizi’s first error is simply the claim that if a sample of people takes a bunch of intelligence tests, the results of those tests will NOT be uniformly positively correlated if you include tests that are different from those used in traditional batteries like the Wechsler. I showed that all the evidence we have indicates that this claim is false.

      It appears that you confuse the question whether tests correlate with the question why they correlate. But these are separate questions. Whether tests correlate because there’s some unitary general factor or because all tests call on the same abilities, the correlations are there.

      Thomson’s model is about why the particular pattern of correlations exists. He showed that it would arise even if there are only uncorrelated abilities provided that they are shared between tests to some extent. He didn’t claim that his model falsified Spearman’s, only that Spearman’s explanation wasn’t the only possible one. Of course, it later became apparent that both Spearman’s two-factor model and Thomson’s original model are false, because they cannot account for group factors.

      The modern g theory posits that there’s a hierarchy of abilities, with g at the apex. As Shalizi points out, such multiple-factor models are unfortunately not as readily falsifiable as Spearman’s two-factor model was. Various kinds of evidence have been proffered in support of the notion that the same g is measured by all diverse IQ batteries, but the best evidence comes from confirmatory factor analyses showing that g factors are statistically invariant across batteries. This, of course, directly contradicts the predictions of g critics like Thurstone, Horn, and Schonemann.

      When you add to this g’s intimate associations with genetic variables, practical outcomes, practice effects, etc., as explained in my post, it becomes clear that it’s difficult to explain human cognitive differences without reference to something very much like general intelligence. This is the case regardless of how unitary or not the neurophysiology of intelligence is. If you want to argue that there’s no general intelligence, you must show how all these facts fit into an alternative model, something Shalizi doesn’t do.

      (A note on Jensen and PCA: He actually regarded the Schmid-Leiman procedure as the best method to extract g, although he also showed that the choice of method makes little difference.)

  16. Thanks for your further clarification. I should mention, before leaving off the topic, that Shalizi has covered the same material in his lecture notes for a course, some two years after the post referenced (see lectures 10-13).


    I don’t think this will change any of the discussion, but it has more formalism and clarity.

    I look forward to you addressing the heritability part of the article, if you get a chance.

    • Thanks, I’ll take a look at those notes.

      I don’t feel like delving into Shalizi’s claims about heritability at the moment, but perhaps I will later. In general, GCTA has been a methodological weapon of mass destruction with regard to arguments seeking to minimize the role that genes play in causing intelligence differences (although I don’t think those arguments were very strong to begin with).

  17. There is an alternative hypothesis to the g hypothesis: Multiple general factors.

    It’s possible that simple mental tasks (of the kind used in all psychometric tests) can be performed by a number of different substitutable mental systems.

    For example, suppose the performances of subject i on tests m and n are given by:
    P_mi = a + b_m * X_i + c_m * Y_i + e_mi
    P_ni = a + b_n * X_i + c_n * Y_i + e_ni

    Here, X and Y are two different cognitive abilities. b and c are positive constants. Assume X_i and Y_i are uncorrelated, and assume e, the error term, is uncorrelated across tests and across individuals.

    In this case, assessing the covariance of performances across two tests m and n, we will have:
    Cov(P_mi, P_ni) = b_m * b_n * Var(X_i) + c_m * c_n * Var(Y_i) > 0

    So even though the two cognitive abilities are uncorrelated (i.e. there is no true “g”), all tests are positively correlated (the “positive manifold” holds), and thus a “g”-type factor can be extracted for any set of tests.

    To make this example concrete, suppose that there are two statistically independent mental abilities, spatial modeling and symbolic modeling (I just made those up). And suppose that any simple information-processing task can be solved using spatial modeling, or solved using symbolic modeling, or solved using some combination of the two. That would result in a positive correlation between all simple information-processing tasks, without any dependence between the two mental abilities.

    Of course, the functional form I chose has the two abilities be *perfect* substitutes, but that is not necessary for the result to hold.

    This has long been my intuitive working hypothesis about mental ability. I have noticed that I tend to solve most math and physics problems symbolically (by writing down equations), while some of my peers seem to solve them all graphically (by drawing pictures). That led me to believe that some people are visual thinkers, while others are symbolic thinkers. That intuition was reinforced by my own high performance on the “mathematical” and “linguistic” parts of IQ tests (which I used to take online for fun), and my average performance on the “visual” or “spatial” parts of the tests.

    Of course, my intuition could easily be wrong. But the math above seems to show that g-like factors can emerge when there are in fact many general intelligence factors present.

    • Your idea seems to be a version of the sampling model discussed in my post. My point is that even if one general intelligence becomes many at some level, at the behavioral level it is unitary. I discussed research showing that while people have relative cognitive strengths and weaknesses these contribute little to the prediction of educational and job outcomes net of the general level of ability. Moreover, different test batteries appear to tap into one and the same g, and there are many other indications of the generality and unidimensionality of g. While the positive manifold may be explained by reference to different kinds of mechanisms, all of them must be able to account for the empirical facts about g which go far beoynd the mere existence of positive correlations between tests. Those who posit sampling models almost never consider how their models fit together with what we know about g beyond the positive correlations, or whether their models really falsify g or just describe it at another level of analysis.

      You may think that there’s a big difference between “visual thinkers” and “symbolic thinkers” or whatever, but research does not support this learning styles paradigm. In their review Pashler et al. concluded:

      Our review of the literature disclosed ample evidence that children and adults will, if asked, express preferences about how they prefer information to be presented to them. There is also plentiful evidence arguing that people differ in the degree to which they have some fairly specific aptitudes for different kinds of thinking and for processing different types of information. However, we found virtually no evidence for the interaction pattern mentioned above [i.e., positive interactions between similar instructional and self-reported learning styles, or “meshing”], which was judged to be a precondition for validating the educational applications of learning styles. Although the literature on learning styles is enormous, very few studies have even used an experimental methodology capable of testing the validity of learning styles applied to education. Moreover, of those that did use an appropriate method, several found results that flatly contradict the popular meshing hypothesis.

      This indicates that learning is a highly general capacity. A caveat here is, as discussed in my post, that at high levels of IQ, specific abilities are more independent. However, you shouldn’t use observations based on exceptional, high-ability individuals to make general conclusions.

      I corrected the covariance equation.

      • This is a digression, but an interesting one. I read Pashler’s abstract, and what it says is not “evidence shows learning styles don’t matter” but “the research on learning styles is done too badly to show whether learning styles matter”.

        This is a different question than whether there are multiple abilities. The Pashler question is whether you can sort out people using personality tests (or suchlike) and then use, e.g. lectures for some and books for others to teach them things better than if you used lectures for all or books for all. The question is still just as relevant if every student’s g intelligence is identical. As I understand it, people suggest that as a hypothesis but nobody’s done good experiments on it. Is that right?

        Actually, one can ask a similar question based on sorting by g. Common sense says that if you have to teach, for example, Bayes’s Rule to a heterogeneous group, you should have the low-g one memorize it and the high-g ones learn how to rederive it. Is that experimentally confirmed?

    • “I have noticed that I tend to solve most math and physics problems symbolically (by writing down equations), while some of my peers seem to solve them all graphically (by drawing pictures).”

      Dear Professor Smith:

      Your peers at Carnegie-Mellon are all above average in intelligence. Moreover, they typically tend to be above average on all forms of intelligence.

      What’s really interesting about the g factor is that people who are above average in spatial ability are not, on average, below average on symbolic ability, and vice-veras. Your fellow professors who are geniuses at spatial reasoning don’t confine their reading to, say, the comments section on Youtube videos.

      In general, those who are above average on one trait tend to be above average on another. It’s not like, say, with cars where acceleration and gas mileage tend to be inversely correlated. The positive manifold of cognitive skills is a strange and important fact of nature that Dr. Shalizi tried to assume away in classic “Assume we have a can opener” style.

  18. Noah, my english is horrific but anyway your idea is not new. More or less the same thing has been stated elsewhere, in a recent and over cited study.
    The referenced paper can be found here :
    Fractionating Human Intelligence (Hampshire et al. 2012)

    Apart from the small sample size (n=16), it also fails to understand the nature of g. Here’s a passage of Jensen’s 1998 book, The g Factor (here), pages 130-132, about the unity of g and the concept of modular abilities, which is what your are referring to.

    The g factor, which is needed theoretically to account for the positive correlations between all tests, is necessarily unitary only within the domain of factor analysis. But the brain mechanisms or processes responsible for the fact that individual differences in a variety of abilities are positively correlated, giving rise to g, need not be unitary. … Some modules may be reflected in the primary factors; but there are other modules that do not show up as factors, such as the ability to acquire language, quick recognition memory for human faces, and three-dimensional space perception, because individual differences among normal persons are too slight for these virtually universal abilities to emerge as factors, or sources of variance. This makes them no less real or important. Modules are distinct, innate brain structures that have developed in the course of human evolution. They are especially characterized by the various ways that information or knowledge is represented by the neural activity of the brain. The main modules thus are linguistic (verbal/auditory/lexical/semantic), visuospatial, object recognition, numerical-mathematical, musical, and kinesthetic. …

    In contrast, there are persons whose tested general level of ability is within the normal range, yet who, because of a localized brain lesion, show a severe deficiency in some particular ability, such as face recognition, receptive or expressive language dysfunctions (aphasia), or inability to form long-term memories of events. Again, modularity is evidenced by the fact that these functional deficiencies are quite isolated from the person’s total repertoire of abilities. Even in persons with a normally intact brain, a module’s efficiency can be narrowly enhanced through extensive experience and practice in the particular domain served by the module.

    Elsewhere, he notes, pages 259-261 :

    But at some level of analysis of the processes correlated with g it will certainly be found that more than a single process is responsible for g, whether these processes are at the level of the processes measured by elementary cognitive tasks, or at the level of neurophysiological processes, or even at the molecular level of neural activity. If successful performance on every complex mental test involves, let us say, two distinct, uncorrelated processes, A and B (which are distinguishable and measurable at some less complex level than that of the said tests) in addition to any other processes that are specific to each test or common only to certain groups of tests, then in a factor analysis all tests containing A and B will be loaded on a general factor. At this level of analysis, this general factor will forever appear unitary, although it is actually the result of two separate processes, A and B. … However, the fact that g has all the characteristics of a polygenic trait (with a substantial component of nongenetic variance) and is correlated with a number of complexly determined aspects of brain anatomy and physiology, as indicated in Chapter 6, makes it highly probable that g, though unitary at a psychometric level of analysis, is not unitary at a biological level.

    By the way,

    • Exceedingly interesting. That does make me think that the claim for unitary g must be analogous to what, in Economics, is called GARP — that is, the existence of a unitary utility function that can rationalise the test data, in the sense of Discrete Choice theory — a sort of Generalised Axiom of Revealed Intelligence.

      Has there been any work along those lines? Maybe it’s time to get Hal Varian involved. I can see that if we ask ‘what is the economic value of a question on an IQ test to the individual, that theory heterogeneous variability allows them to ‘punch above their weight’ on, might be an excellent way to find distributional evidence in the data for the theory. Such situations are not only extremely rare — they are Generalised Extremely *Valuable* to the individual in question, given the economic value of the test!

    • Fleshing out my though a bit — it seems to me that ‘IQ’ is just voting theory turned on its side, so to speak. Suppose we have an island with 2000 people and 7 policy alternatives. We form a matrix with 7 rows and 2000 columns. Each column contains the preference (a rank from 1 to 7) of each ‘voter’. Along the right hand side, we have a social welfare function that computes the social (global island) utility of that alternative. Under suitable assumptions, there is a single voter whose preferences must mirror the social welfare function. Choosing him for dictator is superior to any voting scheme, so far as directly selecting the highest social welfare for the island is concerned.

      Now, instead of making the people the columns, make them the rows — that is we have 7 people and give them a battery of IQ tests consisting of 2000 questions total. Along the right hand column, write down their true IQ score — to the ‘global intelligence’ of each of the 7 test takers in fact known, but we wish to rank the individuals without knowing it using some computation based on their answers or perhaps additional information.

      For each question on each test, we can under suitable assumptions rank the value of that question, ordinal-ly, for each of the 7 individuals — and assuredly each question has economic value to them, since the higher their ‘bundle of scores’ subject to their ‘intelligence constraint’ the better adapted to life they are, which is a sort of utility. Therefore we have columns that are permutations of the numbers 1-7.

      Again, under suitable assumptions, there is a single question on a single test the value of answering which exactly mimics the IQ ranking. Call it the Dictator Question. That is, IQ would seem to be a mirror of ‘ordinal social welfare’, when we think of people as ‘adaptive policy alternatives’ in an evolutionary situation, and the situations they face — modelled in tests — as the ‘individuals’. The Dictator Question plays the role of a ‘representative agent’ I guess — in the sense that a test replicating it many times would correctly model the expectations of administering a battery of tests to a population.

  19. And, my ‘final, final thought’ is that Shalizi — to return to the topic of the OP — has written a review of the Flynn Effect, which may at least have the advantage of clarifying what he thought, before he swore off ‘IQ debates’ entirely and it would seem irrevocably:


    ‘As data reduction, factor analysis is harmless, but there has always been a temptation to “reify” the factors, to suppose that factor analysis discovers the hidden causal structure which generates the observations. This is a temptation which many psychologists, especially IQ-testers, have failed to resist, even eagerly embraced. Flynn protests the “conceptual imperialism” of g. He correctly insists that factor analysis (and related techniques, like item response theory) at most finds patterns of correlation, and these arise from a complicated mixture of our current social arrangements and priorities and actual functional or causal relationships between mental abilities. Factor analysis is helpless to separate these components, and gives no reason to expect that “factor loadings” will persist. Indeed, the pattern of Flynn-effect gains on different types of IQ test is basically unrelated to the results of factor analysis.

    ‘But really the whole enterprise rests on circularities. It’s mathematically necessary that any group of positively-correlated variables has a “positively loaded” general factor. (This follows from the Perron-Frobenius theorem of linear algebra.) A sub-test is “highly g loaded” if and only if it is comparatively strongly correlated with all the other tests; or, to adapt a slogan, positive correlation does not imply common causation. (Saying “Jack solved all the Raven’s problems because he had high scores on many other tests which are positively correlated with scores on Raven’s” is even more defective as an attempted explanation than attributing sleep to a dormitive power.) Since IQ test questions are selected to be positively correlated, the appearance of g in factor analyses just means that none of the calculations was botched. The only part of the enterprise which isn’t either a mathematical tautology or true by construction are the facts that (1) it is possible to assemble large batteries of positively-correlated questions, and (2) the test scores correlate with non-test variables, though more weakly than one is often led to believe. Flynn does not make this argument, and some of his remarks suggest he still attributes too much inferential power to factor analysis, though he correctly says that it has contributed little to our understanding of the brain or cognition.

    ‘After a century of IQ testing, there is still no theory which says which questions belongs on an intelligence test, just correlational analyses and tradition. This is no help in deciding whether IQ tests do measure intelligence, and so whether the Flynn effect means we are becoming smarter. If we accept Flynn’s idea that intelligence is how well and how quickly we learn, an IQ test is an odd way to measure it. None of the tests, for instance, set standardized learning tasks and measure the performance achieved within a fixed time. At best they gauge the success of past learning, which could indirectly measure how well and how quickly people learn if we presume that the test-takers had similar opportunities to learn the material they’re being tested on. Even then it would be confounded with things like executive function and current and past motivation. For instance, in 1998 Lovaglia et al. (American Journal of Sociology 104: 195–228) did an experiment where they took groups of college students and spent fifteen minutes creating a situation in which either the right- or left- handed students could expect to be better-rewarded for their efforts and abilities; the favored hand was randomly varied by the experimenters. This consistently made students in the favored group score about 7 IQ points higher on Raven’s Matrices than those in the disfavored group. That is, a quarter of an hour of motivational priming can be worth a decade or more of the Flynn effect.’

    Also: It would seem very few people in this latest spate of blogging have read the paper on causal vs. correlation effects [Glymour’s paper on the Bell Curve — http://www.hss.cmu.edu/philosophy/glymour/glymour1998.pdf (PDF) ] that he twice mentions in the blog. Skip to section 8 and read the political conclusions (anti-family, for starters), if you want your conservative blood to boil.

    It’s a critique of _The Bell Curve_ in terms of causal diagrams and factor graphs, and rather devastating I think. If you don’t know what a Factor Graph is, Chris Bishop’s book, ch. 8, will explain: research.microsoft.com/~cmbishop/prml/Bishop-PRML-sample.pdf [free online sample]

    • Macrobius.

      Cosma Shalizi is dishonest in his critiques. You should compare what he says — or more often implies — that g men and hereditarians say with what they actually say. Below is a list of books discussing intelligence and heritability. Nearly all of the positions in the books are granted by g men and hereditarians and nearly all claims are mutually consistent.

      Making Sense of Heritability (2005) by Neven Sesardic; Measuring Intelligence: Facts and Fallacies (2004) by David Bartholomew; The g Factor: The Science of Mental Ability (Human Evolution, Behavior, and Intelligence) (1998) by Arthur Jensen

      I have put these books online so you can google around and find them. As for your point above:

      “For each question on each test, we can under suitable assumptions rank the value of that question, ordinal-ly, for each of the 7 individuals — and assuredly each question has economic value to them, since the higher their ‘bundle of scores’ subject to their ‘intelligence constraint’ the better adapted to life they are, which is a sort of utility. Therefore we have columns that are permutations of the numbers 1-7”

      I don’t find this to be an illuminating analogy. With regards to IQ testing, you can rank individuals by g scores because g represents a common property between subtest scores — g, unlike IQ subtest scores, is uni-dimensional.

      “Also: It would seem very few people in this latest spate of blogging have read the paper on causal vs. correlation effects [Glymour’s paper on the Bell Curve”

      More likely is that we are familiar with the larger body of research:


      But you would have to make a specific point. What issue, specifically, do you think is being overlooked?

      “Skip to section 8 and read the political conclusions (anti-family, for starters), if you want your conservative blood to boil….It’s a critique of _The Bell Curve_ in terms of causal diagrams and factor graphs, and rather devastating I think”

      First, I don’t think anyone here is a self identifying conservative. And second, the basic conclusions of the Bell Curve have been demonstrated repeatedly. It seems that your not familiar with the larger body of research. Read up on the background literature to get a sense of the issue:

      Bock, Gregory; Goode, Jamie; Webb, Kate, eds. (2000). The Nature of Intelligence.
      Ones, D. S., & Viswesvaran, C. (2002). Introduction to the special issue: Role of general mental ability in industrial, work, and organizational psychology.

      • ‘I don’t find this to be an illuminating analogy.’ It’s not really an analogy. It’s intended to show that logically unitary g is equivalent to the IIA assumption per Ken Arrow’s Impossibility Theorem — or if you like, equivalent to doing logistic regression on 1s and 0s if your test is scored in a binary fashion (which is a convenience not a necessity).

        That is, it is intended to demonstrate something, not make an analogy or model to anything not already in the concept.

      • To make the construction a bit more clear, the matrix is just the answer sheets turned in by the 7 participants with 2000 questions — scored 0 for incorrect and 1 for correct answer. However, it is perfectly reasonable (if expensive) to have a panel of judges give ordinal scores to all participants on each question, and to use somewhat more complex questions that are more discriminating of total ability. In that case, the ordinal ranks in the columns are just the test scores.

        If you think how algorithm evaluation is done for search engines you will see the point — in that case, the ‘questions’ are query terms and the answers are indeed scored by judges, for each of the algorithms tested, on a numerical scale that has more than ‘correct’ and ‘incorrect’ as choices — usually via a ‘Mechanical Turk’ sort of arrangement, with the human judges solving the HITs (‘human intelligence tasks’) of scoring the machines. I’ve simply inverted the machine testing and the humans here.

        I’m not sure our commitment to what in search engine land and NLP land to ‘binary relevance’ of the answers, is such an essential element of intelligence test design.

      • “Along the right hand column, write down their true IQ score — to the ‘global intelligence’ of each of the 7 test takers in fact known, but we wish to rank the individuals without knowing it using some computation based on their answers or perhaps additional information.”

        I don’t know what you mean by “true IQ scores”. You can derive g-scores with factor analysis. And then you can create a regression formula to predict g-scores e.g., g = .34a + .21b + .34c, where a, b, and c are subtest scores. If this is what you are saying — ok. But this isn’t always how full IQ scores are calculated. As a result, this makes your statement “‘IQ is just” confusing — to me, at least.

        Also, the main discussion here concerns the computation of and meaningfulness of “true IQ scores” — or in your comparison “true voter preference scores” — from which to build the regression equation. You could try to calculate “true scores” in a number of different ways (e.g., averaging subtest scores). The real debate here is whether you can create uni-dimensional scores by which you can rank people in something called “intelligence” — or “voter preference (e.g., for text book defined neoliberal policies)”. The g-men’s point is ‘yes’ — because all subests share a common factor and score differences on this factor can be compared. Using your example, this would be similar to if voter preference scores positively correlated (e.g., people who supported neoliberal policy 1 tended to support neoliberal policy 2), allowing one to extract a p -factor and rank people in terms of neoliberalism. (Obviously, if voter preferences did not correlate — you could still rank people by averaging their scores and ranking the averages — or by selecting a prototypical question and then ranking individuals on that — but such a ranking involves more arbitrariness.)

    • Dr. Shalizi is a little too impressed with his own IQ. His acquaintance with the field of psychometrics is mediocre at best, and thus he makes amateur mistakes motivated by his ignorance, animus, ideology, and arrogance.

    • I just reread Glymour’s article after many years, and was again struck by the fact that he first presents a scorched earth critique of social science methods, and then goes on to argue that America’s social problems can be solved by pouring tons of money into schools, social programs, etc. Glymour is not at all bothered by the fact that the policies he advocates cannot be supported even by those methods he disdains, like regression. Is he being satirical or does he just completely lack self-awareness?

      Glymour’s critique is unimpressive if you know the wider research which Herrnstein and Murray draw on. His alternative causal models are often highly implausible. For example, Glymour writes that shared family influences like mother’s character, attention to small children, the presence of two parents, a scholarly tradition, a strong parental positive attitude towards learning, and where parents went to school may influence both IQ and economic/social outcomes, invalidating Murray and Herrnstein’s causal model. However, we know from behavior genetic research that, firstly, all behaviors are heritable, often to a high degree, and, secondly, that environmental influences on IQ and other behavioral variables are generally not shared between adult siblings. See this classic article, for example. Murray also showed later, using the same data from the Bell Curve, how IQ predicts outcome differences between siblings. And what John said above.

      Both Glymour and Shalizi think that they can invalidate entire research traditions by using clever conceptual arguments, without knowing anything about empirical findings in the field. You will be impressed by their arguments if you know nothing about the research they’re attacking, but if you do, their erudite exercises in straw man slaying are rather tiresome.

      • I think you are on to something there, but you need to tune it. Pearl, Glymour, and by implication Shalizi not only overturn unitary g — they overturn 90% of the conclusions of Social Science. Is such a ‘scorched earth’ critique necessary? I would say yes — Science advance one funeral at a time. But in the mean time, it is well within the purvey of the victims of time’s ever rolling stream to say ‘I’m not dead yet’ — but then they should say what they don’t like about Glymour, Spirtes, Pearl etc — it *has* been done.

        James M. Robins and Larry Wasserman ‘Rejoinder to Glymour and Spirtes’, say.

  20. One (like Columbo) last point: Glymour, Spirtes, and Schenes say this, which is why I asked you if you believed Thomson’s critique was devastating against Spearman or not:

    ‘Spearman’s inference to common causes from vanishing tetrad
    differences was challenged by Godfrey Thomson in a series of papers between 1916 and 1935.
    In our terms, Thomson’s models all violated linear faithfulness.’ (p. 200, section 6.13)

    That is, even in terms of Pearl’s SCM and Glymour’s critique, the Thomson model violates Glymour’s ‘faithfulness’ premise, as he says in his own book. I can’t help but think that this logical nit is worth following up.

  21. Pingback: What is AGI? | Machine Intelligence Research Institute

  22. The Woodcock Johnson III is probably the best indicator of g, although the best available, culture-free, means of estimating g – is the RPM. In contrast to what you state here, even the older WJ-R is a better estimator of g, then the WAIS.

    • As indicators of g, all widely used multiple-ability batteries seem to be very similar. That is, their underlying g factors are very highly correlated with each other and the g loadings of their global/full-scale scores are similar. The RPM is not terribly “culture-free”, and is probably a poorer indicator of g than tests with more diverse content.

      • On second thought, I agree. The RPM seems rely heavily on a Working memory component, which is enough, to automatically disregard it as a ‘culture free’ measure of intelligence (perhaps, in a limited sense, ‘context free’, but certainly not ‘culture free’). The Quantitative factor (Equation Balancing and Applied Problems) ,is interesting, as it seems to tap into some sub-component of g, that goes beyond what fluid intelligence measures – from my investigations, I believe some efficiency in searching LTM (associative heuristic). Those types of math problems require selecting from a broad set in LTM, and (time) ordering elements to form a meaningful relationship – and probably doing it very very fast, so that it doesn’t consume limited resources in working memory capacity. This is in contrast with RPM problems, where the set has already been selected and arranged, but you have to discover the ‘operator’ that relates the elements. Although, some may be quick to attribute Gq with crystallized (learned) ability, an analysis of the chart above, argues against that idea, that Gq is significantly crystal loaded. First, it’s pretty obvious that equation balancing and applied problem sub-tests, do not involve ‘complex math’, but rather elementary operations – basic 8th grade type math that most any subject can do. But more importantly the analysis-synthesis test almost correlates as well, with Applied Problems, as it does with Concept Formation……and actually according to the chart Applied Problems also correlates better with pretty much all other , non-crystallized measures (Woodcock Johnson Cog-g Scale), than Concept formation. So whatever small crystallized component Gq may be measuring, it is more than offset by it’s ability to tap into the g factor, in ways that even the mighty fluid intelligence cannot. Long ago, the philosopher Immanual Kant wrote about the syntheticity of mathematics. I think it’s time they listen to what he had to say.

    • But with that being said why does the WJ IV standard battery measure Working memory twice, with story recall and numbers reversed, measures fluid intelligence twice, with number series and concept formation, and moved visual-auditory processing to extended battery, in light of it’s relatively high g-loading, ability to tap glr, and greater relevance with respect to newer models of dual processing? Also there is no inclusion of anything mathy, like applied problems (which merely consists of very basic calculations), which by more recent accounts correlates significantly better with the general factor, and taps reasoning in a way that can’t be assessed by fluid intelligence tasks (number series is a measure of linear reasoning, but not the same as measuring non-linear reasoning with LTM assocaitive/heuristics). What a shame, the new battery is clearly biased towards fluid processing

  23. What’s interesting, (as mentioned in this post) is that a few studies, indeed, have revealed that quantitative concepts (including applied problems and quantitative concepts) highly correlate with g, even more-so than fluid intelligence tasks – which has long been considered the best proxy of g. On second thought, this shouldn’t be too surprising, as Gq problems are, firstly, more complex – requiring the handling of more variables in working memory (and thus underlying features of g), and secondly Gq problems are a better reflection of ‘abstraction’. Gq performance relies heavily on associating information to elements outside of the problem. The particular step of relating information via LTM, based on subtle (implicit) similarities, may be a long-ignored hallmark of intelligence. Perhaps related to gq perfromance is the fairly high correlation of associative memory tasks with general intelligence. In fact, the WJIII suggests that the glr cluster can be used as a proxy for general intelligence.

    • Yes, the math factor is separate from the fluid factor, and more loaded, too. This is confirmed by coGAT (although coGAT has validity issues, itself) designer David Lohman and WJIII Radex analysis. The WJ is a fine measure of intelligence based on the dominant CHC paradigm – as you mention it is one of the only tests to measure long term memory functions , which comprise a highly g-weighty, glr cluster. In contrast, to the abundance of claims, there is no test which taps into g directly. Every IQ sub-test is a measure of some degree of g + error . The WJ-III measures 9 broad factors, in order to get a solid estimate of g. Tests, like the Ravens are not as good at measuring g (though the raven’s is valid as a proxy), simply because they are estimates based on a fewer number of sub-factors – the Raven’s, for example, is restricted to a spatial and fluid factor (and perhaps Short term memory). And the lack of validity on certain tests, such as for the coGAT – suggests they may be largely measures of random skills, which may even inter-correlate to a large extent (because of their sub-tests sharing specific sub-factors), but nonetheless, are not good reflections of g.

  24. Pingback: g | Technology as Nature

  25. Very nice blog post indeed. My question is somewhat related to it. DO you know any work that correlates “personality traits” with “g-factor”….? I know there is some correlation with openness…but is there any elaborate thinking on this line?

    • Judge et al. meta-analyzed correlations between g and the Big Five personality traits, with the following results:

      Conscientiousness -0.04
      Agreeableness 0.00
      Extraversion +0.02
      Emotional stability +0.09
      Openness +0.22

      Openness has the most substantial association, partly because many openness items assess self-estimated intelligence. The slightly negative correlation between g and conscientiousness has often been found, but it may be an artifact of sampling bias.

      In general, there appear to be all sorts of personalities across the entire IQ range, at least in terms of the Big Five, suggesting that the etiologies of intelligence and personality are mostly distinct. Multivariate behavior genetic analyses could shed more light on this, but I’m not aware of any.

  26. Pingback: In a variable world, are averages just epiphenomena? | Dynamic Ecology

    • I skimmed his/her comments, but I don’t see any coherent argument that would be worth addressing. Most of the relevant issues have been discussed at length in this blog.

      • Prior to my main response, could you clarify in layman’s terms “KJK”‘s counter-commentary to the article here, which occurs in the comments section. He gave the last comment: http://humanvarieties.org/2014/05/15/research-on-genetic-g-and-differential-heritabilities/

        There are two arguments of relevance:
        1) “Swank”‘s attack on Twin Studies, beginning following this sub-thread: https://robertlindsay.wordpress.com/2015/08/20/education-improves-your-brain-on-a-physical-level/#comment-237080

        2) Swank’s argument here: https://robertlindsay.wordpress.com/2015/08/21/new-comment-thread-for-education-improves-your-brain-on-a-physical-level/#comment-237437
        “explained in statistical parlance is not the same as caused by.
        “However, since high heritability is simply a correlation between traits and genes, it does not describe the causes of heritability which in humans can be either genetic or environmental.”

        “Heritability does not indicate the degree to which a trait is genetic, it measures the proportion of the phenotypic variance that is the result of genetic factors.”

        “Most importantly we should recognize that heritability does not mean genetically determined.”

        And if you’d bother to read that article, you’d realize where the centraldispute lies:
        “Part of what is at stake between followers of Lewontin and Sesardic is whether or not VGxE and other components of variance are negligible or significant.”

        So do humans experiments replicate conditions sufficiently like those of crops or animals in animal breeding warranting similar inferences?

        “In experimental organisms, there is no problem in separating environmental from genetic similarities. The offspring of a cow producing milk at a high rate and the offspring of a cow producing milk at a low rate can be raised together in the same environment to see whether, despite the environmental similarity, each resembles its own parent. In natural populations, and especially in humans, this is difficult to do. Because of the nature of human societies, members of the same family not only share genes, but also have similar environments”

        “Because it is difficult to randomize the environments, even in cases of adoption, evidence of heritability for human personality and behavior traits remains equivocal despite the very large number of studies that exist.”

        The answer is a clear “no.” And only strong priors would lead one to answer “yes.””

      • The other criticism I was wondering if you could reply to concern genetic associations with intelligence likely being false positives because researchers cannot replicate any gene findings.

        If you could respond to this and what I recently wrote in the comments, either by linking to relevant articles or providing relevant commentary in a reply, it would be very much appreciated.

      • Kan’s argument seems to be mostly about the shortcomings of the method of correlated vectors and the fact that non-g models can explain Jensen effects, too. Viewed in isolation, as a methodological critique, I don’t really disagree with his argument. But when you consider the totality of evidence and the power of different theories to explain the facts at hand, I view the g theory as clearly superior. While it’s possible to explain MCV findings in non-g terms, g theory explains the various findings very straightforwardly, something which cannot be said of competing theories. Moreover, SEM studies are also consistent with g theory, as I discussed here, so the critique of MCV is ultimately moot.

        “Swank”‘s attack on Twin Studies, beginning following this sub-thread:

        On the validity of the classical twin method, see these two recent papers: [1] and [2].

        “explained in statistical parlance is not the same as caused by.
        “However, since high heritability is simply a correlation between traits and genes, it does not describe the causes of heritability which in humans can be either genetic or environmental.”

        Heritability is the proportion of phenotypic differences between individuals caused by genetic differences between them. If heritability is estimated accurately, then it does have this straightforward causal interpretation. By definition, the causes of heritability cannot be environmental.

        To argue that theoretical constructs like “additive genetic variance” cannot contain causal information because they (supposedly) aren’t concrete enough, is to engage in arbitrary theoretical “legislation” of what is allowed in science. There is no one “correct” level of analysis of genetic causation. Certainly, genetic variance components must ultimately be reducible to molecular genetic mechanisms — as is being done in GWAS research — but the current lack of a more reductive account does not in any way invalidate behavioral/quantitative genetics.

        “Part of what is at stake between followers of Lewontin and Sesardic is whether or not VGxE and other components of variance are negligible or significant.”

        Yes, and theory and data point to those components being small in humans, just like in animals and plants. For example, GCTA research where the genetic and phenotypic similarity of unrelated individuals is compared shows that most of IQ heritability can be explained by the additive effects of common genetic variants. This indicates that interaction terms cannot be large because unrelated invididuals don’t share environments except due to genetic reasons.

        The mainly additive etiology of individual differences in complex traits is also theoretically and mathematically well nigh necessary in natural populations. This is because when there are lots of genetic and environmental influences, each with small effects on population variance — which is definitely the case with phenotypes like IQ — interactions cannot contribute much to population variance. This is true even when there are strong interactions at the individual level. That is, interactions at the level of gene action in an individual generally contribute to additive rather than nonadditive variance at the population level. This logic is explained in the context of epistasis in this paper. Large interaction components are generally found only in model organisms whose allele frequency spectra have been artificially reduced.

      • As to the Flynn effect, different eras and cultural contexts favor the development of different skills. If you want to call them intelligence, then the Flynn effect has increased intelligence, sure. However, if you want to compare the intelligence of different groups, you must have invariant indicators of intelligence, which are uncommon between generations but common between contemporaneous groups.

      • Is your research able to account for this?: https://web.archive.org/web/20030402143807/http://tedbarlow.blogspot.com/2003_01_19_tedbarlow_archive.html#87942098

        “Conley thought: Why not look at wealth, at family assets, as opposed to merely earnings? Would the differences in academic achievement still be there? He almost dismissed his own insight. A million people, he thought, must have studied this before.
        As it turned out, nobody had. And when Conley did, he discovered that if you looked at the assets of black families and white families, or in other words their wealth, their children performed equally in school. (my emphasis)” – the view is that proxies for wealth like income and SES are all bad proxies for wealth between races.

        There is some commentary on this here: http://isteve.blogspot.com/2014/03/2008-sat-scores-by-race-by-income.html, and here: http://www.gnxp.com/MT2/archives/000414.html, aside from this brief comment: http://isteve.blogspot.com/2014/03/2008-sat-scores-by-race-by-income.html?showComment=1394988670369#c2216043138260832088, I don’t see anything that attempts to deal with the issue at hand.

      • You have been very helpful so far, but in addition to information pertaining to the above made comment, I wonder if you could shed insight on the following:
        On a separate forum, regarding the Flynn effect article, I mentioned this: https://groups.google.com/forum/#!topic/brain-training/4AJJJjo9jRU
        and received this reply:
        “The Flynn Effect is not a training or test-taking effect. The fact that we are scoring so much better on some tests proves that we are actually getting smarter in those areas. One of those is raw processing speed. Others are verbal analytical thinking and visuospatial analytical thinking. YES they are “hollow gains” WTR “g.” That is, they are not on g. However, we are definitely smarter in all of the things that those tests are measuring.”

      • I probably shouldn’t use the word “finally” for this statement, as I may have more questions at a later time, but what would you say to the following:

        There are many accounts from early explorers concerning extremely brutal living conditions of blacks that were completely indigenous to them. the philosopher G.W.F. Hegel summarized all of this in his The Philosophy of History (Colonial Press, 1900) – pp. 93-99: “The peculiarly African character is difficult to comprehend, for the very reason that in reference to it we must quite give up the principle which accompanies all our ideas— the category of Universality…. Another characteristic fact in reference to the Negro is slavery… Bad as this may be, their lot in their own land is even worse, since a slavery there quite as absolute exists; for it is the essential principle of slavery, that man has not yet attained to a consciousness of his own freedom, and consequently sinks down to a mere Thing— an object of no value. Among the Negro moral sentiments are quite weak, or more strictly speaking, non-existent. Parents sell their children and conversely children their parents, as either has the opportunity… the polygamy of the Negroes has frequently for its object the having of many children, to be sold, every one of them, into slavery… From these various traits it is manifest that want of self control distinguishes the character of the Negroes. This condition is capable of no development or culture, and as we see them at this day, such have they always been…. At this point we leave Africa, not to mention it again. For it is no historical part of the World; it has no movement or development to exhibit.”

        The response would be that “It is a FACT that hunter-gatherers were taller. It is a FACT that they had more leisure time. It is a FACT that they were better nourished. It is a FACT that they lived longer.”

        I wonder if you know anything that might shed light on this discrepancy.

      • 1) Wealth and race differences

        To statistically adjust for wealth differences between races is not a form of causal analysis. Because wealth differences between whites and blacks are vast, the net worth of the average white family is similar to that of black families that are exceptional, in the right tail of the black wealth distribution. From a hereditarian perspective, it’s not meaningful to compare ordinary white families, with average genetic propensities for their race, to exceptional black ones, with exceptional genetic propensities for their race.

        I am not familiar with Conley’s research on this issue, but in her analysis of math test score differences in the CNLSY sample, Amy Orr found that the race coefficient remained significant after parental wealth was included as a predictor, indicating that wealth did not statistically explain all of the test score differences between blacks and whites.

        2) Flynn effect

        If you want to think of skills at solving cognitive test items as intelligence, then the Flynn effect has increased people’s intelligence. However, from my perspective the purpose of cognitive tests is not to test your vocabulary size, or skills at arithmetic problems or Raven’s matrices, or whatever other kinds of items an IQ test may have. Rather, the purpose of tests is to measure latent abilities, such as general intelligence, of which test scores are just unreliable, epiphenomenal, culture-bound indicators. The extent, if any, to which those latent abilities have improved across generations is unclear, but it’s certainly much less than the increases in observed test scores.

        3) Neural plasticity

        Neural plasticity is a buzzword that doesn’t mean much of anything. The fact that the brain is malleable is not news. We have always known that humans are able to learn things, which changes your brain. Reading this comment changes your brain. Everything changes it. None of this means that we have, say, the means of making a stupid person smart. Instead, individual differences in intelligence remain highly stable across the life span, mainly due to the life-long persistence of genetic effects.

        4) Hegel on Africa

        I don’t know what your question is.

      • You have been very helpful so far. The issue seems to be that the picture of life among indigenous Africans painted by Hegel (and I am accumulating corroborating sources), is in conflict with apparent facts about hunter-gatherers – that they were taller, had more leisure time, were better nourished, and lived longer.

        Two sources pertaining to hunter gatherers are as follows:
        “The hunter-gatherers’ diet was more varied and balanced than what agriculture later allowed. Average height went down from 5’10” (178 cm) for men and 5’6″ (168 cm) for women to 5’5″ (165 cm) and 5’1″ (155 cm), respectively, and it took until the twentieth century for average human height to come back to the pre-Neolithic Revolution levels.[48] Agriculturalists had more anaemias and vitamin deficiencies, more spinal deformations and more dental pathologies.”


        As I said, I am investigating this issue further, but I wonder, for the sake of edification, if you have any preliminary information.

        Thank you for your help.

      • The shift from hunting and gathering to agriculture led to a lower standard of living for the average man, yes, but most of black Africa had adopted agriculture long before significant contacts with Europe.

      • “most of black Africa had adopted agriculture long before significant contacts with Europe.”

        just for record keeping, do you have any useful sources on this?

      • It’s not a topic I know much about, but John Reader’s “Africa: A Biography of the Continent” has some good discussion of the peculiarities of agriculture in Africa.

      • Ben, I didn’t publish your recent comments because they consist just of long quotations from miscellaneous sources. Please stay on topic and be concise.

      • So I hope what I say here will make things more clear:
        I am interested in working to help re-create the Eugenics movement and to popularize memes that would allow for widespread cultural acceptance and promotion of that. I believe that if this can happen, then with a Eugenics oriented culture, we can have an ideological basis to overcome current demographic and cultural crises. I believe that this goal should be shared by sites like this and should be the ultimate political objective of this site. Everything I have posted has been with this in mind (and of the yet to be approved items, only the last 3 I posted I feel need to be put up here and are of relevance to this).

        If there are any moral aversions to this, then the following resource should help to dispel them all and to show that anti-Eugenics claims are based on spurious and fraudulent arguments: http://eugenics.net/

      • Ben, this is the comments section for my article on the g factor. If you want to promote your views on eugenics, start a blog of your own.

Leave a Reply

Your email address will not be published. Required fields are marked *