Statistics, a lecture by Ricardo Nirenberg. Spring 1997, the University at Albany, Project Renaissance.

There should be no need to emphasize the importance of statistics in our society. The President is now in trouble for having pressed and accepted certain legally dubious monetary contributions; that money was mostly spent paying for pre-election polls—finding out what the public wanted to hear—, in other words, for statistics. Statistics doesn't just get presidents in trouble, however; it is as essential for everyday policy or business decisions as it is for natural science. Whenever there are too many people, too many particles, too many things or events, so that it would be nearly impossible to find out the state or behavior of each individual or the outcome of each event, we have recourse to statistics.

We will concentrate on one problem here, and although it may seem to you to be a very special one, it contains all the basic elements that go into all practical decisions based on statistics. This problem was mentioned in my previous lecture: to decide whether a coin is fair or not. Suppose we flip the coin 1,000 times and we get 485 heads and 515 tails: does the coin seem fair? Now suppose we got instead 400 heads and 600 tails: what then? Needless to say, it is extremely unlikely that out of 1,000 flips we would get exactly 500 heads and 500 tails, even if the coin were fair; on the other hand, getting too many heads, say 988, would be very strong evidence that the coin is not fair: where do we draw the line? What would be a number of heads above which we would conclude that the coin is not fair? This is a typical statistical problem. The coin turning up heads or tails is the result of too many and too tiny physical events for us to be able to answer the question as to the fairness of the coin in any other, non-statistical, way.

Statistics operates as follows: we assume that the coin is fair (this is only an assumption!), in which case we have that the expected number of heads is half the number of tosses. Now we toss the coin N times (before N was 1,000 but of course it doesn't have to be). Most of the time, especially if N is high, we will NOT actually get N/2 heads exactly; we will get a number of heads, N(H), which will be different from N/2. The important thing is to keep track of the difference; if it is "too high" we will conclude that the coin is not fair, and if it is not "too high" we will conclude—what? well, merely that there is not enough experimental evidence to brand the coin as unfair (which is not the same as declaring it fair!) The main problem is to clarify what we mean by "too high."

To do that, we will find out what should be expected about the difference N(H) - N/2. How high should we expect it to be, assuming the coin is fair? Note that this difference can be positive or negative, so when I say "how high" I mean either high toward the positive side or toward the negative side. Let us consider the following graph (which is technically known as a "random walk"): on the horizontal axis we will label the tosses by number, from 0 to N, and we start drawing a broken line, starting at 0: each time the coin comes up heads we draw a line going up one unit, each time the coin comes up tails we draw a line going down one unit. We get a broken line which after N tosses of the coin will end up at a certain distance from the horizontal axis; we call this distance D(N) and we have: D(N) = N(H) - N(T), the number of heads minus the number of tails. Of course, D(N) can be positive or negative, according to whether we get more heads or more tails, so it is more convenient to look at the square of D(N), which will never be negative. Now since the first flip of the coin must result in our line either going up one unit or down one unit, we have D(1) = +1 or -1, hence D2(1) = 1. On the other hand, when we look at the distance from the horizontal after n+1 flips, it must be equal to the distance achieved after n flips plus or minus one unit, so: either D(n+1) = D(n) + 1 or D(n+1) = D(n) - 1. Squaring all sides we get: D2(n+1) equals either [D(n)+1]2 or [D(n)-1]2, each alternative having probability 1/2. Expanding both squares we get that D2(n+1) is either D2(n) + 2D(yn) + 1 or D2(n) - 2n + 1. Since each of these has probability 1/2, to find what we expect we must do as in our previous lecture, multiply each of those two quantities by 1/2 and add them, and so we get that the expected value of D2(n+1) is equal to the expected value of D2(n) + 1. This means that as we increase the number of tosses of the coin by one unit, the expected value of the square of the distance also goes up by one unit. Since for one toss D(1) was equal to 1, we see that finally the expected value of D2(N) is N.

This very important result can be expressed a little differently by taking square roots; we may say that the expected value for the distance itself, D(N), is minus or plus the square root of N. Remember at the beginning we were really looking not for the expected value of D(N) = N(H) - N(T), number of heads minus number of tails, but rather for the expected value of N(H) - N/2, the difference between our actual number of heads and half the number of tosses. But there's an obvious relation between these two. Since N(H) + N(T) = N (the number of heads plus the number of tails equals the number of tosses), doing a little simple algebra we get that N(H) - N/2 = D(N)/2. So finally the expected value of N(H) - N/2 turns out to be plus or minus (1/2)square root(N). This number has a technical name: it's called the standard error of our coin-tossing experiment. So the standard error is half the square root of the total number of tosses.

Going back to our question (is the coin fair?), we may answer roughly thus: suppose we toss the coin 1,000 times and we get a certain number N(H) of heads; we look at N(H) - N/2, in other words, how far is the number of heads from 500. We expect it to be about plus or minus 16 away from 500, because the square root of 1,000 is about 32, which divided by two is 16. This means that if the coin is fair we should expect the actual number of heads to be within 16 units of 500, that is, between 484 and 516. You see, now we know what we mean by a number of heads (or tails) that's "not too high". We can say something much more precise by using the bell curve which I explained last time. It turns out that, if the coin was fair, the probability of the number of heads being between 484 and 516 is the same as the area under the "standard" bell curve between one unit to the left of center and one unit to the right, and this area is about 0.68, or 68% of the total area. So the probability of a fair coin landing heads between 484 and 516 times out of 1,000 is 0.68. In the same way, by using the bell curve, we can tell what's the probability of staying within two SEs from the average (500). Since the SE was 16, two SEs is 32, and what we're asking is for the probability of the number of heads being between 468 and 532: it turns out to be about 95% or 0.95. Similarly, if we look at three SEs, that is, 48, the probability of the number of heads being between 452 and 548 is about 99% or 0.99. The probability of the number of heads being say five standard errors away from the average 500 is extremely low, like winning the lottery.

So let us look back at the numbers I gave you before. The first question was: suppose we get 485 heads and 515 tails: does the coin seem fair? Well, 485 heads is within the range 484 to 516, so there's nothing exceptional about our results. For all we know, the coin may be fair. But the second question was: suppose we got 400 heads and 600 tails: what then? Now 400 heads is 100 units away from the average 500, and since each SE is 16, our number of heads is about 6 standard errors away from 500. The probability of getting a result 6 SEs away from the average (assuming the coin was fair) is smaller than the chance of winning the lottery, it is fantastically low. And so the statistician reasons as follows: under the assumption that the coin was fair, we got a fantastically unlikely event, almost a miracle; rather then believe in miracles, I prefer to reject the original assumption that the coin was fair, so I pronounce it not fair (it is more likely to land tails than heads). Statisticians call this second case "statistically significant," meaning it allows them to reject the original hypothesis and say the coin is not fair. The first case, when we got 485 heads, would not be statistically significant, for it allows them to conclude nothing.

There are subtle questions here. In the two numerical examples I just gave you, the situation was clear-cut, but suppose now the number of heads is borderline. Suppose we get 468 heads. We reason as follows: 468 is 32 units away from 500, now 32 units means two SEs (one SE being 16), so let's look at the probability of our coin deviating from the expected 500 heads by 32 units or more: it turns out (from the bell curve) to be about 5%, or 0.05. An event with probability 0.05 can be expected to happen 5 times out of 100. You can't quite call it a miracle, but is that probability small? It all depends on the real-life situation. A statistician cannot tell you whether 5% is small or not, all he can tell you is: "Assuming your coin was fair, the probability of getting the kind of result we got (we got 468 heads) is about 5%." It's up to you then to decide whether or not 5% is small enough to warrant the conclusion that the coin is not fair. To consider 5% as small, or to say no, we'll consider small anything below 1%, or below 0.1%, or what have you, is always a policy decision, not a statistician's.

Let's go back to our standard error. Remember that it was the expected difference between the number of heads in N tosses of a fair coin and N/2 (how many heads one expects). This expected difference turned out to be (1/2) square root(N). So if we toss a fair coin 100 times we may expect the number of heads to differ from 50 by about plus or minus 5; if we toss a fair coin 10,000 times we may expect the number of heads to differ from 5,000 by about plus or minus 50, and so on. Notice that the more times we toss the coin, the higher will the expected difference be. So it is NOT true that the more we toss the coin we can expect the number of heads to be closer to half the number of tosses. The truth is contained in the LAW OF LARGE NUMBERS, which says that the RELATIVE difference, that is, the expected difference between the number of heads and half the number of tosses DIVIDED by the total number of tosses, [N(H) - N/2]/N, will become closer and closer to 0 as we increase N. This fundamental result, without which all statistics would be impossible, is due to the fact that the numerator, the expected difference N(H) - N/2, is the standard error, so it is 1/2 times the square root of N, and that square root saves the day! For we have (1/2)square root(N)/N = (1/2)1/square root(N), and 1/square root(N) gets closer and closer to 0 as N gets large.

Let's see how this works with polls. Suppose you are working for Teresita Balboa, running for the Senate from New York, and you want to poll the voters to predict the outcome; but Ms Balboa is running on a shoestring, so you decide to take a sample of just 100 voters in the state, and ask them how they are going to vote. Several comments here: first, polling people is expensive, and that's why no one ever polls all the voters; the larger the sample, the more expensive the procedure will be. Secondly, the way one chooses the sample is crucial; the most famous cases where polls went wrong (Roosevelt vs Landon, Truman vs Dewey) were based on faulty samples. What is a faulty sample, or in technical terms, a biased sample? If we take a sample of 100 voters in New York, it may be biased because it contains a disproportionate number of Republicans or Democrats. If we select our sample by looking a directory of a teachers' union, for example, we may be confident that we'll choose more Democrats than if we were to select our sample by looking at the membership roster of the most expensive golf club; indeed, we would get too many Democrats compared to the whole state, which will make our predictions unreliable. How to choose samples is a difficult art or science, an important part of statistics, but we have no time to dwell on it.

Going back to your work for Ms Balboa, you already got your unbiased sample of 100 voters, you have interviewed each of them and found out that 52 of them intend to vote for your boss, and 48 will vote for the other candidate. So what do you do? You go back to Ms Balboa and tell her that the poll predicts she'll win by 52% to 48%? Not so fast. You didn't ask all the voters, just 100 of them, so there's an element of chance in your result; in other words, even if your sample is unbiased, you may have gotten more (or less) supporters of Balboa in your sample than in the whole population. This is exactly what happens when we flip a coin say 100 times: even if the coin is perfectly fair, we may get 46, 47, 48,..., 51, 52, etc. heads rather than exactly 50, and that's technically called "chance variation." So you must tell Ms Balboa that the poll says she'll get 52% of the vote, give or take something — and here's the important point: give or take how much? Answer: the standard error of your sampling experiment. To get it, you must proceed exactly as we did with the coin-tossing experiment: here we are dealing with voters, not coins, and instead of "heads" we count how many voters prefer Ms Balboa, but this doesn't make any difference. Using the square root formula, we get a standard error of (1/2)square root(N) for the number of voters who prefer Balboa, and since N=100, we get SE = 5. If we want to speak of percentages, 5 out of a hundred is just 5%. So you go and tell your boss: "This poll gives you 52% of the vote, give or take 5%." (In technical parlance, this is called "a confidence interval").

Quite naturally, Ms Balboa is not entirely satisfied with this result. Sure, 52% of the vote is gratifying, but since it is "give or take 5%," this could mean something possibly as low as 47%, in which case Balboa would lose the election! A "give or take" of 5% won't do, she tells you. She wants a more precise estimate. How precise? (you ask). Well, something between 1.5% and 2% would be just fine (she says). In other words, she wants you to get a "give or take" about three times smaller than 5%, or in still other words, she wants you to be three times more precise in your estimate. How can you do that? There's only one way: get your standard error to be three times smaller. The standard error for the percentage was (1/2)square root(N) divided by N and multiplied by 100, that is, 50/square root(N). To make this three times smaller you must make square root(N) three times bigger (since it appears in the denominator), and to make square root(N) three times bigger you must make N nine times bigger. The upshot is, you tell your boss that if that's what she wants, an estimate three times more precise, you'll have to poll 900 voters instead of just 100, and that will cost her 9 times as much as the previous poll.

So you see how expensive these things may get to be, thanks to that innocent-looking square root. To end our story, confronted with the numbers, Ms Balboa and you decide that the best and most productive way to spend your energies and talents is fund-raising — $1,000-per-plate dinners, hobnobbing with fat cats and union leaders, courting the few people who count. Otherwise, how could you afford the big expense of finding out what the rest of the voters want?

Required Reading:

Chapter 6 of Feynman's book.