ECHIDNE OF THE SNAKES: The Greater Male Variability Hypothesis

Thursday, January 13, 2011

The Greater Male Variability Hypothesis

This is the fifth post in my series about the study of sex differences. The previous one discussed the mental rotation test because it's the one most often brought up as an "explanation" for why women are scarcer in science and engineering than men.

This post addresses a more recent argument for the scarcity of women on top everywhere, not just in mathematics, science and engineering! Yes, it's a powerful argument and so neat, because it applies even if women score as well as men on some cognitive test. It even applies if women score higher, on average, than men, because it's based on the extreme tails of the distribution of scores.

Here's a graph of two distributions, taken from Eliot's book Pink Brain. Blue Brain.

One (very informal) way to interpret the picture is to imagine the two mountain shapes as describing the outlines of piles of men and women: The higher a point on a mountain is, the greater the number of either men or women that are described by the score under that point on the horizontal axis. Thus, both men and women score the same on this test, on average, because the two mountain peaks coincide.

But note that the male mountain has thicker tails (the areas to the far right and far left in the picture). This means that more men than women score both high and low in whatever test this picture represents.

Another way of saying the same is that men exhibit greater variability. Their scores are scattered further around the mean value than the scores of women.

Now to the arguments of the essentialists. These go as follows:

It is well known that male animals show greater variability than female animals on all sorts of characteristics. Therefore, male variability in human test results is based on similar reasons and probably something innate. Unfortunately, and with great sadness we must report that women and men cannot be equal on the very top, because more men score in the upper tails of various test distributions and it is those upper tails from which people at the top come from.

Now, that is my summary of the relevant opinions, made clearer by the condensing. But the basic argument, pretty much, is that the generally equal average scores of men and women don't really matter if men are more likely to be found in the upper tails of various distributions.

Besides, that they are also found in the lower tails of those distributions demonstrates how fair all this is to women: They may not end up on top of various careers but neither are they likely to end up as criminals!

Don't believe those last two sentences? Cordelia Fine in Delusions of Gender (p. 179) quotes Lawrence Summers:

It does appear that on many, many different human attributes -- height, weight, propensity for criminality, overall IQ, mathematical ability, scientific ability...there is a difference in the standard deviation and variability [statistical measures of the spread of a population] of a male and a female population. And that is true with respect to attributes that are and are not plausibly culturally determined. If one supposes, as I think is reasonable, that if one is talking about physicists at a top-twenty-five research university...small differences in the standard deviation will translate into very large differences in the available pool.

So beautiful! Though I do wonder with Fine how something like "propensity to NON-criminality" might express itself, I wonder even more about some questions which Summers seems to regard as answered:

Are various tests the same as human attributes?

As men, on average, might take a riskier approach to test-taking, why couldn't that be the reason for the fatter tails of the male distributions? Suppose that men and women have the same average knowledge on some test but that men guess more often than women. Based on how guessing is punished, one possible outcome is exactly the one of fatter tails for the male distributions.

Finally, I wonder if anyone has actually studied whether people on the top of their fields actually scored in the extreme upper tails of some appropriate test earlier in their lives.

An important aspect of the greater male variability hypothesis as an innate explanation of sex differences needs further analysis. That is the need for that greater male variability to be constant. If it varies by, say, countries or over time, then it cannot measure innate variability differences alone (if at all).

Here new research poses problems for the essentialists. From Cordelia Fine's Delusions of Gender (pp. 180-1)

More recently, several very large-scale studies have collected data that offer tests of the Greater Male Variability hypothesis by investigating whether males are inevitably more variable in math performance, and always outnumber females at the high end of ability. The answer, in children at least, is no. In a Science study of over 7 million United States schoolchildren, Janet Hyde and her team found that across grade levels and states, boys were moderately more variable than girls. Yet when they looked at the data from Minnesota state assessments of eleventh graders to see how many boys and girls scored above the 95th and 99th percentile (that is, scored better than 95%, or 99%, of their peers) an interesting pattern emerged. Among white children there were, respectively, about one-and-a-half and two boys for every girl. But among Asian-American kids, the patterns were different. At the 95th percentile boys' advantage was less, and at the 99th percentile there were more girls than boys. Start to look in other countries and you find further evidence that sex differences in variability are, well, variable. Luigi Guiso's cross-cultural Science study also found that, like the gender gap in mean scores, the ratio of males to females at the high end of performance is something that changes from country to country. While in the majority of the forty countries studied there were indeed more boys than girls in the 95th and 99th percentiles, in four countries the ratios were equal or even reversed. (These were Indonesia, the UK, Iceland and Thailand.) Two other large cross-cultural studies of math scores in teenagers have also found that although males are usually more variable, and outnumber girls at the top 5 percent of ability, this is not invariably so: in some countries females are equally or more variable, or are as likely as boys to make it into the 95th percentile.

All this matters for the Greater Male Variability Hypothesis to be taken as an innate one. If such tests truly measured nothing but an innate characteristic then we should find the difference in variability between male and female test-takers identical across different countries.

And over time. Probably the most famous of all studies of greater male variability is the early 1980s study by Camilla Benbow and Julian Stanley. It was based on giving the mathematics SAT test to seventh and eighth graders and then analyzing the top performers in that test.

The results were dramatic (Eliot, Pink Brain. Blue Brain, p.212):

Benbow and Stanley found that within this talented pool, many more boys than girls scored at the highest level on the math SAT exam: a four-to-one ratio for scores above six hundred and a thirteen-to-one ratio for scores above seven hundred. But they made the bigger splash by speculating the high ratio was a consequence not of math education but of "endogenous" or innate, sex differences in mathematical talent. Newsweek seized upon their conclusion with the headline "Do Males Have a Math Gene?" while Time magazine declared, "a new study says that males may be naturally abler [in mathematics] than females."

Such fun. My next post on the sterotype threat explains why headlines like those can actually decrease girls' ability to do well in math tests! But note how those popularizations moved from upper-tail findings to all men and all women just like that!

What came next? In fact, the Benbow-Stanley study has been repeated since the early 1980s. The 2005 repetition found that there were 2.8 boys for each girl in the group which scored over seven hundred. Remember that the numbers were 13 boys to one girl in the early 1980s.

Innate sex differences have not changed in those fifteen-or-so intervening years. Instead, the smaller ratio of super-talented boys-to-girls must be caused by something environmental or cultural, and there is nothing to suggest that the most recent ratio is the lowest possible one.

It seems pretty clear to me that the male upper-tail advantage cannot be regarded as an innate explanation, given the above findings. Whatever may drive the observed gender differences has at least a sizable chunk of environmental causes.

Note, also, that those who advocate the essentialist form of the Greater Male Variability hypothesis rarely discuss what the fatter lower tails in various score distributions might mean for men. It's as if men (as a class) should be content with belonging to the group with the fatter upper tail, even if they themselves happen to fall into the fatter lower tail. Likewise, it's unclear what the practical consequences of scoring in the lower tail might be for men, as compared to women. The debate has focused almost completely on the upper-tail differences.

The main point of this and my previous post on mental rotation is that the phenomena we are speaking about in these two cases clearly can be changed or clearly do change. Thus, they are not stable or impossible to change. Yet that is the way the essentialists use them.

My next post will be on the stereotype threat: The reason why making arguments about mental rotation tests or greater male variability in tests as impossible to change (when they are changing) can be hazardous to girls and women. Indeed, it turns out that making arguments about gender differences or ethnic differences in tests can actually create those differences.