Friday, December 12, 2008

Sex and Statistics

My previous post about Daniel Kruger's study concerning the number of sexual partners people have had or wish to have and how that correlates with how much they spend resulted in me actually reading the original article.

I'm still a little bit flabbergasted by the fact that peer review passed it on. Who are the peers who thought this was an acceptably presented empirical analysis? The very reason for peer reviews is to weed out obvious mistakes of various types. But this did not happen with Kruger's paper.

To see why all this matters, remember that Kruger's main point is to seek current evidence which would bolster his evolutionary psychology argument that men who have more resources attract more sexual partners. It is not enough to find that people who have more resources attract more sexual partners, because the evo psycho theory is that what's appealing in men is resources and what's appealing in women is body. So we need to be shown that resources (or whatever weird proxy is used for them here, financial consumption values) are correlated differently with sex for men and women.

Given this, it would seem extremely important that the article showed us the results for the female data set. But it does not! We are just told that the results didn't show any relationship between financial spending by women and the number of sexual partners they had. So take it on faith?

Well, not quite. There's an odd bit about all this in the article. I quote:
Male and female samples were combined to provide a direct test of the predicted moderation by sex. Gender (1=female, 2=male) and financial consumption were multiplied to create an interaction term predicting each log transformed SOI variable.

Interaction terms are common in econometrics, too, and often one of the terms to be multiplied is a qualitative one, such as gender. The way one transforms qualitative binary variables into numerical ones is by using 1 and 0 as the values. It doesn't matter which sex we assign the value 1, because what the interaction term is measuring is the differential effect some other variable (the one we multiply the gender variable with) has on the variable we want to explain, by gender.

Now why would Kruger use 2 for men, instead of zero? Perhaps the statement is a typographical error? Perhaps men were assigned the value zero? Let's look at his findings about the interaction term assuming that we have a typo here:

The findings (from Table 2) give the coefficient for the interaction term as 0.01 for past sex and 0.015 for future sex. These would then be the extra effects each additional unit of the financial consumption measures have on the number of past and future sexual partners FOR THE GROUP we assigned the value 1. That would be women, if the initial discussion contained a typo. So women would actually be the group that has more sex the more they spend, not men.

My conclusion is that he really used 1 and 2 as the values for women and men. This doesn't make any sense at all to me. Perhaps someone can explain why he did it?

More generally, it's not possible to read the paper and to find out what the equations are that he actually estimated. The theoretical discussion at the beginning suggests that he has in mind something like this:

Number of sexual partners = Constant + b*education +c*marital status +d*age +e*financial consumption

where b, c, d and e (and the constant term) are the coefficients that the analysis will estimate when we plug in the data on education, marital status, age and financial consumption on the right-hand side and the three measures of the number of sexual partners on the left-hand side (for three analyses). Stars stand for multiplication.

These equations would be estimated separately for men and women, not because statistical tests show that this should be done, but because Kruger's basic theory believes that it should be done!

But it's not at all clear if he indeed estimated this equation but something which only contained the terms that stood out in the zero correlations table. If only those terms were included then the results he talks about in the press release don't actually control for marital status, say.

I now want to return to the interaction term discussion. Perhaps someone pointed out that it's not a great idea to argue that men and women are different in this behavior and then not to show any of the results that would let us judge the argument? Perhaps that someone suggested that it might be a good idea to pool the data (use all the data for both sexes in the three equations) and to add an interaction term for gender and financial consumption to test if there actually is a differential effect by gender? The equation would look something like this (assuming that no basic gender term was also included):

Number of sexual partners = Constant + b*education +c*marital status +d*age +e*financial consumption + f*(financial consumption*gender)

Here the gender term would equal 1 for one sex and 0 for the other sex. Suppose that we assign the value 1 for 'male' and the value 0 for 'female'. Now plug in those value to see the form the equation takes: The coefficient for financial consumption for women is e, while the coefficient for financial consumption for men is e+f. Estimation would give us the value for f.

There are all sorts of reasons why adding just one interaction term this way might not be good statistics (for example, it assumes no gender interaction in the other terms on the right-hand side). But I have never seen 1 and 2 used in these applications. It makes no sense.

What's the conclusion then? The article doesn't give us the evidence which it supposedly has unearthed. This means that we can't judge the evidence.