Thursday, October 05, 2006

Statistics Primer. Part 2: Probability



Statistics is not the same as probability theory, but the latter is used in statistics and a small detour into the wonderful world of probabilities is necessary here. Let's start by grabbing the concept of probability by its horns: What is this thing?

It's a way of quantifying the likelihood of some event. Call the event that Echidne will tomorrow wake up all cheerful the event A (just to call it something short and sweet). Denote the probability of this event A with the shorthand p(A) (this is said "p of A"). How could we quantify this wonderful likelihood?

We can do it by defining an impossible event and a sure thing. Let's fix these two extreme values as follows:
p(A)=0 if A is an impossibility
and
p(A)=1 if A is certain to happen.

Given these two fixed values all other probability values would fall in the range from zero to one

This is fun. If I tell you that p(A) =0.14 for the event of me waking up chirpy as a bird you can now tell that I don't think it's very likely to happen. But there is an even funner aspect of this, for we can always define a second event, notA, being the event in which Echidne will not wake up cheerful. If p(A)=0.14, then the probability of notA, or the complement of A, will be...what?

It makes sense that it would be 1-p(A) = 0.86. Because something is going to happen and if it's not a cheerful Echidne, then it must be a grumpy or neutral Echidne that rises from that divine bed tomorrow morning, assuming that she does rise. As long as we are not counting some of my mental states in both events A and notA, this will work.

So what was funner about this? The fact that uncertainty actually increases as we move from probabilities close to zero towards the middle values and that it also increases as we move from probabilities close to one towards the middle values. So we have most uncertainty when the probability hovers around 0.50. When we are closer to the endpoints of zero or one, either the event or its complement is almost sure to happen, so we have less uncertainty. Plus the fact that probability theory can be used to make all sorts of formulas which will let us find the probabilities of combined events happening and so on. I'm not going that way but you may wish to do if you find this entertaining.

You might say that all this is well and good, but what is the anchor that settles this whole probability thing? Where do we get those values you made up here? There are three possible answers to this question. The first is the classical definition of probability and is best explained by thinking about games of chance in which the rules of events happening are simple to follow and where we can find very small events which clearly are equally likely to occur. For instance, think of the following game: You toss two fair dice at the same time. What is the probability that the dots on the top sides of the two dice add up to seven?

The solution consists of counting the number of events (here an event is the way the two dice fall) in which the dots add up to seven and then counting the total number of events, whatever the number of dots might add up to. The probability of the event we are interested in (the dot sum is 7) is the ratio between the two counts. The way to find these count values is by....counting!

First, I can count the total number of events by noting that the first die can take any value from one to six and so can the second die. This mean that for any value of the first die the second one could take any one of six possible values. Given that the first die can also take six values, the total number of events is 6 times 6 or 36. Second, I can look at all these 36 events and add the dots on the two dice for each of them. When I do that, I find that in exactly six cases (1,6), (2,5), (3,4), (4,3), (5,2) and (6,1) the sum comes to 7. (Note that in the pairs I've given here the first die value always comes first and the second die value second and that (1,6) and (6,1) are two separate events). Third, we make up the probability ratio. Here it is 6/36 or 1/6 or 0.167.

The second way of anchoring the probability concept is more important in actual statistical studies, and that is to link probabilities of future events to what happened in the past. This makes sense as long as whatever affects these events hasn't changed in the meantime. This definition is called the objective definition of probability (to distinguish it from the third definition still to come) and also the long-run relative frequency definition of probability. The latter name hints at the way the probabilities are derived: By using long enough strings of information about actual events and by assuming that the events will replicate at the same frequencies in the future. The word "relative" is added because we standardize the probability measure to the scale from zero to one.

An example of this approach would be taking a coin that is known to be unfair (so that heads and tails are not going to be equally likely in tossing it) and finding out what the probability of head is by tossing the coin again and again and by writing down whether heads or tails turned up on each toss. Suppose you toss this coin a million times and find out that heads came out 400,000 times. Then the probability of heads using this coin would be 400,000/1,000,000 = 0.4.

Sadly, the easiest teaching examples on probability tend to be stuff like that. But the same principle applies to studies about voting activity or opinions in general.

The third definition of probability is the subjective one, also called the Bayesian definition. This differs from the other definitions in that a Bayesian statistician could ask a question such as this: What is the likelihood that Echidne is grinding her teeth right now? A strict objectivist would not ask such a question, because either I am grinding my teeth now or I am not; it's just that others can't observe which it might be. The subjective definition of probability has to do with our beliefs about events and is not strictly limited to predicting future probabilities. It can handle the way learning more facts changes our beliefs and other interesting questions like that.

The probability concepts most used in statistical studies are the long-run relative frequency view (which is used in the studies themselves) and the subjective view (which is used in the way we interpret the margins of error and similar concepts). My next post will talk a little more on the concept of probability distributions, needed for understanding sampling distributions.

What I hope you got from this post was the rough feeling that is conveyed by something like "candidate X has the probability of 0.7 of winning next month's elections", and that you'd also want to ask what the basis for this prediction is. It might be totally subjective or perhaps there was a poll in which 70% of those surveyed stated that they were going to vote for X. This relative frequency (70% is 0.7 in relative frequency terms) is then used as the probability of X getting elected, which naturally assumes that people will act according to their stated intentions in the survey and that the survey was representative of actual voters.


Part 1
Part 2
Part 3
Part 4
Part 5
Part 6