There should be no need to emphasize
the importance of statistics in our society. The President is
now in trouble for having pressed and accepted certain legally
dubious monetary contributions; that money was mostly spent paying
for pre-election polls—finding out *what* the public wanted
to hear—, in other words, for statistics. Statistics doesn't
just get presidents in trouble, however; it is as essential for
everyday policy or business decisions as it is for natural science.
Whenever there are too many people, too many particles, too many
things or events, so that it would be nearly impossible to find
out the state or behavior of each individual or the outcome of
each event, we have recourse to statistics.

We will concentrate on one problem here, and although it may seem to you to be a very special one, it contains all the basic elements that go into all practical decisions based on statistics. This problem was mentioned in my previous lecture: to decide whether a coin is fair or not. Suppose we flip the coin 1,000 times and we get 485 heads and 515 tails: does the coin seem fair? Now suppose we got instead 400 heads and 600 tails: what then? Needless to say, it is extremely unlikely that out of 1,000 flips we would get exactly 500 heads and 500 tails, even if the coin were fair; on the other hand, getting too many heads, say 988, would be very strong evidence that the coin is not fair: where do we draw the line? What would be a number of heads above which we would conclude that the coin is not fair? This is a typical statistical problem. The coin turning up heads or tails is the result of too many and too tiny physical events for us to be able to answer the question as to the fairness of the coin in any other, non-statistical, way.

Statistics operates as follows: we *assume*
that the coin is fair (this is only an assumption!), in which
case we have that the __expected__ number of heads is half
the number of tosses. Now we toss the coin N times (before N
was 1,000 but of course it doesn't have to be). Most of the time,
especially if N is high, we will NOT actually get N/2 heads exactly;
we will get a number of heads, N(H), which will be different from
N/2. The important thing is to keep track of the difference;
if it is "too high" we will conclude that the coin is
not fair, and if it is not "too high" we will conclude—what?
well, merely that there is not enough experimental evidence to
brand the coin as unfair (which is not the same as declaring it
fair!) The main problem is to clarify what we mean by "too
high."

To do that, we will find out what should
be expected about the difference N(H) - N/2. How high should
we expect it to be, __assuming__ the coin is fair? Note that
this difference can be positive or negative, so when I say "how
high" I mean either high toward the positive side or toward
the negative side. Let us consider the following graph (which
is technically known as a "random walk"): on the horizontal
axis we will label the tosses by number, from 0 to N, and we start
drawing a broken line, starting at 0: each time the coin comes
up heads we draw a line going up one unit, each time the coin
comes up tails we draw a line going down one unit. We get a broken
line which after N tosses of the coin will end up at a certain
distance from the horizontal axis; we call this distance D(N)
and we have: D(N) = N(H) - N(T), the number of heads minus the
number of tails. Of course, D(N) can be positive or negative,
according to whether we get more heads or more tails, so it is
more convenient to look at the square of D(N), which will never
be negative. Now since the first flip of the coin must result
in our line either going up one unit or down one unit, we have
D(1) = +1 or -1, hence D^{2}(1) = 1. On the other hand, when we
look at the distance from the horizontal after n+1 flips, it must
be equal to the distance achieved after n flips plus or minus
one unit, so: either D(n+1) = D(n) + 1 or D(n+1) = D(n) - 1.
Squaring all sides we get: D^{2}(n+1) equals either
[D(n)+1]^{2 } or
[D(n)-1]^{2}, each alternative having probability 1/2. Expanding
both squares we get that D^{2}(n+1) is either D^{2}(n) + 2D(yn) + 1
or D^{2}(n) - 2n + 1. Since each of these has probability 1/2,
to find what we expect we must do as in our previous lecture,
multiply each of those two quantities by 1/2 and add them, and
so we get that the expected value of D^{2}(n+1) is equal to the expected
value of D^{2}(n) + 1. This means that as we increase the number
of tosses of the coin by one unit, the expected value of the square
of the distance also goes up by one unit. Since for one toss
D(1) was equal to 1, we see that finally the expected value of
D^{2}(N) is N.

This very important result can be expressed
a little differently by taking square roots; we may say that the
expected value for the distance itself, D(N), is minus or plus
the square root of N. Remember at the beginning we were really looking not
for the expected value of D(N) = N(H) - N(T), number of heads
minus number of tails, but rather for the expected value of N(H)
- N/2, the difference between our actual number of heads and half
the number of tosses. But there's an obvious relation between
these two. Since N(H) + N(T) = N (the number of heads plus the
number of tails equals the number of tosses), doing a little simple
algebra we get that N(H) - N/2 = D(N)/2. So finally the expected
value of N(H) - N/2 turns out to be plus or minus (1/2)square root(N).
This number has a technical name: it's called __the standard
error__ of our coin-tossing experiment. So the standard error
is half the square root of the total number of tosses.

Going back to our question (is the coin fair?), we may answer roughly thus: suppose we toss the coin 1,000 times and we get a certain number N(H) of heads; we look at N(H) - N/2, in other words, how far is the number of heads from 500. We expect it to be about plus or minus 16 away from 500, because the square root of 1,000 is about 32, which divided by two is 16. This means that if the coin is fair we should expect the actual number of heads to be within 16 units of 500, that is, between 484 and 516. You see, now we know what we mean by a number of heads (or tails) that's "not too high". We can say something much more precise by using the bell curve which I explained last time. It turns out that, if the coin was fair, the probability of the number of heads being between 484 and 516 is the same as the area under the "standard" bell curve between one unit to the left of center and one unit to the right, and this area is about 0.68, or 68% of the total area. So the probability of a fair coin landing heads between 484 and 516 times out of 1,000 is 0.68. In the same way, by using the bell curve, we can tell what's the probability of staying within two SEs from the average (500). Since the SE was 16, two SEs is 32, and what we're asking is for the probability of the number of heads being between 468 and 532: it turns out to be about 95% or 0.95. Similarly, if we look at three SEs, that is, 48, the probability of the number of heads being between 452 and 548 is about 99% or 0.99. The probability of the number of heads being say five standard errors away from the average 500 is extremely low, like winning the lottery.

So let us look back at the numbers I gave you before. The first question was: suppose we get 485 heads and 515 tails: does the coin seem fair? Well, 485 heads is within the range 484 to 516, so there's nothing exceptional about our results. For all we know, the coin may be fair. But the second question was: suppose we got 400 heads and 600 tails: what then? Now 400 heads is 100 units away from the average 500, and since each SE is 16, our number of heads is about 6 standard errors away from 500. The probability of getting a result 6 SEs away from the average (assuming the coin was fair) is smaller than the chance of winning the lottery, it is fantastically low. And so the statistician reasons as follows: under the assumption that the coin was fair, we got a fantastically unlikely event, almost a miracle; rather then believe in miracles, I prefer to reject the original assumption that the coin was fair, so I pronounce it not fair (it is more likely to land tails than heads). Statisticians call this second case "statistically significant," meaning it allows them to reject the original hypothesis and say the coin is not fair. The first case, when we got 485 heads, would not be statistically significant, for it allows them to conclude nothing.

There are subtle questions here. In the two numerical examples I just gave you, the situation was clear-cut, but suppose now the number of heads is borderline. Suppose we get 468 heads. We reason as follows: 468 is 32 units away from 500, now 32 units means two SEs (one SE being 16), so let's look at the probability of our coin deviating from the expected 500 heads by 32 units or more: it turns out (from the bell curve) to be about 5%, or 0.05. An event with probability 0.05 can be expected to happen 5 times out of 100. You can't quite call it a miracle, but is that probability small? It all depends on the real-life situation. A statistician cannot tell you whether 5% is small or not, all he can tell you is: "Assuming your coin was fair, the probability of getting the kind of result we got (we got 468 heads) is about 5%." It's up to you then to decide whether or not 5% is small enough to warrant the conclusion that the coin is not fair. To consider 5% as small, or to say no, we'll consider small anything below 1%, or below 0.1%, or what have you, is always a policy decision, not a statistician's.

Let's go back to our standard error.
Remember that it was the __expected__ difference between the
number of heads in N tosses of a fair coin and N/2 (how many heads
one expects). This expected difference turned out to be (1/2) square root(N).
So if we toss a fair coin 100 times we may expect the number
of heads to differ from 50 by about plus or minus 5; if we toss
a fair coin 10,000 times we may expect the number of heads to
differ from 5,000 by about plus or minus 50, and so on. Notice
that the more times we toss the coin, the higher will the expected
difference be. So it is NOT true that the more we toss the coin
we can expect the number of heads to be closer to half the number
of tosses. The truth is contained in the LAW OF LARGE NUMBERS,
which says that the RELATIVE difference, that is, the expected
difference between the number of heads and half the number of
tosses DIVIDED by the total number of tosses, [N(H) - N/2]/N,
will become closer and closer to 0 as we increase N. This fundamental
result, without which all statistics would be impossible, is due
to the fact that the numerator, the expected difference N(H) -
N/2, is the standard error, so it is 1/2 times the square root
of N, and that square root saves the day! For we have (1/2)square root(N)/N
= (1/2)1/square root(N), and 1/square root(N) gets closer and closer to 0 as
N gets large.

Let's see how this works with polls.
Suppose you are working for Teresita Balboa, running for the
Senate from New York, and you want to poll the voters to predict
the outcome; but Ms Balboa is running on a shoestring, so you
decide to take a sample of just 100 voters in the state, and ask
them how they are going to vote. Several comments here: first,
polling people is expensive, and that's why no one ever polls
*all the voters*; the larger the sample, the more expensive
the procedure will be. Secondly, the way one chooses the sample
is crucial; the most famous cases where polls went wrong (Roosevelt
vs Landon, Truman vs Dewey) were based on faulty samples. What
is a faulty sample, or in technical terms, a *biased* sample?
If we take a sample of 100 voters in New York, it may be biased
because it contains a disproportionate number of Republicans or
Democrats. If we select our sample by looking a directory of
a teachers' union, for example, we may be confident that we'll
choose more Democrats than if we were to select our sample by
looking at the membership roster of the most expensive golf club;
indeed, we would get too many Democrats compared to the whole
state, which will make our predictions unreliable. How to choose
samples is a difficult art or science, an important part of statistics,
but we have no time to dwell on it.

Going back to your work for Ms Balboa,
you already got your unbiased sample of 100 voters, you have interviewed
each of them and found out that 52 of them intend to vote for
your boss, and 48 will vote for the other candidate. So what
do you do? You go back to Ms Balboa and tell her that the poll
predicts she'll win by 52% to 48%? Not so fast. You didn't ask
all the voters, just 100 of them, so there's an element of chance
in your result; in other words, even if your sample is unbiased,
you may have gotten more (or less) supporters of Balboa in your
sample than in the whole population. This is exactly what happens
when we flip a coin say 100 times: even if the coin is perfectly
fair, we may get 46, 47, 48,..., 51, 52, etc. heads rather than
exactly 50, and that's technically called "chance variation."
So you must tell Ms Balboa that the poll says she'll get 52%
of the vote, __give or take__ something — and here's the important
point: give or take how much? Answer: the standard error of your
sampling experiment. To get it, you must proceed exactly as we
did with the coin-tossing experiment: here we are dealing with
voters, not coins, and instead of "heads" we count how
many voters prefer Ms Balboa, but this doesn't make any difference.
Using the square root formula, we get a standard error of (1/2)square root(N)
for the number of voters who prefer Balboa, and since N=100, we
get SE = 5. If we want to speak of percentages, 5 out of a hundred
is just 5%. So you go and tell your boss: "This poll gives
you 52% of the vote, give or take 5%." (In technical parlance,
this is called "a confidence interval").

Quite naturally, Ms Balboa is not entirely satisfied with this result. Sure, 52% of the vote is gratifying, but since it is "give or take 5%," this could mean something possibly as low as 47%, in which case Balboa would lose the election! A "give or take" of 5% won't do, she tells you. She wants a more precise estimate. How precise? (you ask). Well, something between 1.5% and 2% would be just fine (she says). In other words, she wants you to get a "give or take" about three times smaller than 5%, or in still other words, she wants you to be three times more precise in your estimate. How can you do that? There's only one way: get your standard error to be three times smaller. The standard error for the percentage was (1/2)square root(N) divided by N and multiplied by 100, that is, 50/square root(N). To make this three times smaller you must make square root(N) three times bigger (since it appears in the denominator), and to make square root(N) three times bigger you must make N nine times bigger. The upshot is, you tell your boss that if that's what she wants, an estimate three times more precise, you'll have to poll 900 voters instead of just 100, and that will cost her 9 times as much as the previous poll.

So you see how expensive these things may get to be, thanks to that innocent-looking square root. To end our story, confronted with the numbers, Ms Balboa and you decide that the best and most productive way to spend your energies and talents is fund-raising — $1,000-per-plate dinners, hobnobbing with fat cats and union leaders, courting the few people who count. Otherwise, how could you afford the big expense of finding out what the rest of the voters want?

Chapter 6 of Feynman's book.