Wednesday, August 12, 2015

Why Statistics Is Sexy. Or The Need to Distinguish Between Large and Small Numbers. A Re-Posting

(Originally posted here.)

I've always liked statistics as a science  but never thought it hawt and sexy.  Now I wish we could make statistics more sexy (bare more skin?) in order to save more of us from falling into those hidden wolf traps of the net.  They don't have sharpened sticks, those traps (holes in the ground, covered by branches), but they do hurt our understanding in somewhat similar ways.

An example of the wolf trap:  Someone writes on, say, racism or sexism in recent events and then gets attacked by trolls.  Suppose that in one scenario there are five very active trolls hammering at the poor writer, in an alternative scenario there are five thousand such trolls.

The two scenarios are not the same, they don't tell us the same story about the likely number of people "out there" believing whatever those trolls believe.  That's why it's very wrong to argue that the presence of five Twitter trolls in one's mentions means that the troll-opinion is extremely common in the real world.  Yet in the last week I've seen several people take that view of events:  The mere existence of any nasty trolls (and nasty they are) means that those trolls have sizable backing in the world of opinions, ideas and values.

So that is about proportions or percentages.  There will always be people with extreme nasty values, there will always be some who troll.  To unearth a troll comment and then to write about it as if it represents a sizable number of people in the real world is lazy and just wrong.  Even utopia would have a few trolls, hankering for life in hell.

It matters whether 0.1 percent or 60% of Americans believe that broccoli should be banned.  Those who don't get that difference are going to create "the-sky-is-falling" stories, and they are not ultimately helpful.

Add to all that the problem of self-selection, which means that those who comment on any particular incendiary topic are much more likely to be the ones who hold the extreme opposite view of the one any particular writer has used in a piece (broccoli haters, whether 0.1% or 60%, will be much more likely to be in the comments section of your Broccoli Is King article than anyone else).

That's why the comments sections, especially if not moderated, are dominated by angry voices and often opinions better suited to critters who just crawled out of the primeval slime*.  You know, the way any article about gender inequality that focuses on women gets comments from angry meninists.

People who agree with the writer tend not to waste time scribbling that down under the article, and people who aren't that bothered either way tend not to spend time in the comments, either.  The Twitter discussions work on somewhat similar principles, though the fact that people have followers makes them less hostile to the imagined writer here.  But those who hated what you wrote are the ones with real energy to look up your handle and then enter the "discussion."

These two problems I've described above are a) ignoring the actual prevalence of various beliefs  and b) ignoring self-selection on the net.  That double-ignorance can have bad consequences:  We may be misled into believing that a molehill is a mountain, we may initiate much larger angry fights with an imaginary enemy (windmills?) and we may misunderstand the scope of the problem altogether.

A similar problem is born when someone writes an article starting with the planned plot.  Suppose that the plot is how much people hate broccoli.  The intrepid journalist will then go out and interview people.  What if the vast majority of those interviewed aren't bothered about broccoli at all?  That statement will not have a prominent place in the planned story.  Instead, even if it takes a very long time, the journalist will find a few people who reallyreally hate that green tree-pretender among the vegetables, and it is the opinions of those few people that we all will then read.

The next stage (and believe me I've seen this stage recently, though not about broccoli hating) is for people to talk about the vast camp of broccoli haters and mention the opinions of the interviewed few as representative of what that vast camp thinks.

This doesn't mean that anecdotes cannot reflect majority views or the views of an important numerical minority.  But strictly speaking an anecdote, if true, tells us only that one particular person held a particular opinion.  It doesn't tell us how common that opinion is.  For that we need the collection and analysis of statistical data about the whole relevant population (all vegetable eaters in the case of broccoli).

So all this was what has stopped me from writing on various interesting topics yesterday.  Aren't you glad I shared?
-----
*With all due apologies to critters from the primeval slime who are probably charming and empathic ones.