Probability, a lecture by Ricardo Nirenberg. The University at Albany, Project Renaissance, Spring 1997.

The concept of probability is a very old one. Some late Greek philosophers ventured the notion that all truth is only probability, greater or smaller as the case may be. But the concept of probability was quantified, that is, subjected to numbers and their operations, only in the 17th century, with the work of two Frenchmen, Blaise Pascal and Pierre de Fermat. I've already mentioned these two names in previous lectures. The original motivation for the development of what came to be called the calculus of probability was a gambling problem. A French gentleman who liked to gamble asked Pascal about the following situation: two gamblers are playing a certain game of chance, and their object is to gain a certain number of points. The likelihoods of each gambler winning a point was different, each had put a certain amount of money at the start, and the question was: suppose the game is interrupted at some moment when player A has x points and player B has y points; how should the money in the pot be divided? This problem is too hard for us at this point, so let's deal with a much simpler game: let us toss a coin. If it comes out tails you win, if it comes up heads I do. Suppose we each bet a dollar each time. The first thing we ought to do is to check the coin physically: is it perfectly homogenous, its center of gravity right in the middle? Then we should check the method for tossing it: does it favor any particular side? As you can see, this always leaves some doubt as to the precision of our measurements and conclusions. We would like to say, "Heads and tails are equally likely, i.e. the coin is fair," but physical checking of the coin can only reach the conclusion, "I don`t see any physical reason why one side would turn up more often than the other." Of course, we could check this experimentally, tossing the coin a billion, a trillion times and counting how many heads and tails we get. This, however, would be quite inconclusive. Suppose we toss the coin a million times and we get 500,543 heads and 499,457 tails: is the coin fair or not? So, assuming we didn't find any physical reason why the coin would not be fair, we invoke the Principle of Sufficient Reason, the same Anaximander invoked at the beginning of abstract thought (remember?), and we say, "Since we see no reason for heads turning up more often than tails or viceversa, we conclude that they are equally likely." And so we assign to each, heads and tails, the number 1/2.

Why 1/2? This is just a convention. In probability theory, one proceeds as follows: if we have an experiment such as tossing a coin once, we define something called the sample space as the set of all possible outcomes. In this case, we only have two sample points: heads (H) and tails (T). So our sample space contains just two elements, H and T. We agree that the probability number we assign to the whole sample space is 1. We also agree that if we have two subsets of the sample space which do not overlap, that is, two sets of outcomes which cannot possibly occur simultaneously, the probability of the two subsets put together (the union) is equal to the sum of the probabilities of the two separately. In our case, the probability of T plus the probability of H must be 1, but since we agreed that they have the same probability, we get probability of H = P(H) = probability of T = P(T) = 1/2.

Now imagine a slightly harder experiment: we'll toss the coin twice. What's the new sample space? It contains four points: TT, TH, HT and HH. Again, each of these is equally likely, so the probability of any of them must be 1/4. If we toss the coin three times, we'll get 8 sample points, each with probability 1/8. And so on. Let us now consider somewhat more complex events. An event is simply a collection of sample points, and to get the probability of an event we simply add up the probabilities of its sample points. For example, if we toss a fair coin twice, what's the probability of getting at least one head? This means: either one head or two heads. Let's look at the sample points which belong to this event: HH, HT, TH. Of course, TT doesn't belong to it, because if we get two tails we get no heads. So, adding up, the probability of getting at least one head is 3/4. Another: suppose we toss a fair coin six times; what's the probability of getting at least one head? Here it is easier to proceed indirectly. We argue as follows: getting at least one head is the contrary of getting no heads at all; now, the probability of the sample point TTTTTT (tails six times) is (1/2)6 = 1/64. So, the probability we want (the probability of all the other sample points) must be 1 minus 1/64, that is, 63/64. When we come to more complex events, though, we need Pascal's help. For example, what's the probability of getting exactly 5 heads in 8 tosses of a fair coin? This is harder because we must count how many sample points (that is sequences of 8 Hs or Ts) contain exactly 5 Hs. Of course, one such sample point is HHHHHTTT, but there are many other ways of getting 5 Hs! How many? It's the same as asking, In how many ways can we select 5 spots out of 8 in which to place an H (in the other 3 we'll place a T)? Pascal had the answer: 8x7x6/3x2 = 56. Instead of giving you the general formula, let me show you Pascal's triangle:

 
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1

...and so on.

Observe that each number is the sum of the two numbers directly above it.. If we want to know in how many ways we can select 5 items out of 8, we go to the 8th row (starting from 0), which happens to be the last one I wrote. Then, again starting from position 0 (first number to the left,) we proceed to position 5 and get 56. Pascal proved many wonderful things about this triangle, and in the process introduced some very fundamental concepts of mathematics. This was a few weeks before he had his great religious illumination (November 1654) and decided that math was not worth the effort. For the purpose of counting, Pascal's triangle is not very useful if we are dealing with large numbers; if we want, for example, to count in how many ways we can select 876 items out of 12,768, we would have to spend time writing 12,768 rows! Fortunately, in the 18th and 19th centuries easier methods were devised which give approximate answers (for example, with the bell curve, about which more later). But going back to our original question, what's the probability of getting 5 heads in 8 tosses of a fair coin? Now the solution is simple: 56 times the probability of a sample point, (1/2)8, in other words, 56/256 = 7/57.

In the same manner we can answer all kinds of questions about other games, throwing dice, playing poker, roulette, etc. I only give the example of coin-tossing because it is simpler. Now remember that we were betting money, one dollar each time: if it comes up heads I win, tails you win. If we toss the coin say a hundred times, we may ask: how much can I expect to win? This has a precise mathematical answer, called the expected gain, and it is computed as follows: since the probability of my winning a dollar is 1/2 and the probability of my losing a dollar is 1/2, my expected gain will be 1/2 times one dollar plus 1/2 times minus one dollar. The result, of course, is zero. In other words, this is "a fair game." But if you play casino roulette, or lottery, you may be sure that your expected gain will be negative. Another example: suppose we throw a fair die 100 times, and each time if it comes up a 3 I win a dollar, otherwise I lose a dollar. What's my expected gain? Since the probability of getting a 3 is 1/6 and the probability of any other number turning up is 5/6, my expected gain will be 1/6 times a dollar plus 5/6 times minus a dollar, everything times 100. The result is about minus $67: I should expect to lose about 67 dollars. Pascal applied this to religious apology. He argued that either God exists or He doesn't. We may assume that the probability of His existing is extremely small, say one in a trillion. But what do we gain if we believe in Him and it turns out that He does exist? Eternal bliss (infinite gain!). And if we don't believe in Him and He does exist, we gain Hell (infinite loss!) If we bet that He doesn't exist, we may gain something in this life perhaps, but it's certainly not infinite. So when we compute the expected gain, we conclude that it's better for us to believe in God, for even if His existence is extremely unlikely, our gain will still be infinite. This is the famous "Pascal's bet." It has not convinced many people.

You may be asking yourselves why this lecture on probability comes right after the previous lecture on physics. One reason is, probability is essential in physics. One way of understanding this is by looking back at our situation with gas in a container, where a trillion trillion molecules are moving around, hitting the walls and each other. If we ask for example how many molecules have velocities, or energies, within a certain range, we will not be able to give a precise numerical answer (there are too many molecules!), but we are able, given physical data like temperature, to give a probabilistic answer, in other words, to say how likely it is that a molecule will have an energy within that range: this is good enough for most practical purposes. Physicists discovered that asking about molecules of gas and their energies (or momenta) is very much like a game of placing balls in boxes: you can think of the molecules as balls and of the different energy ranges as boxes. So the question comes up: in how many different ways can we place n balls into m boxes? We assume the balls and the boxes are distinguishable, for example by each having a numerical label attached to it. The answer to this question is not hard. Take the first ball: we have m choices, because we have m boxes where we may place it. Now, for each choice for the first ball, we have again m choices concerning the second ball, for again we can place it in any of the m boxes. So the total number of choices for both ball #1 and ball #2 turns up to be m times m = m2. We continue like this with ball #3, #4, etc. For the first 3 balls we have m3 choices, for the first 4 balls we have m4 choices, and, in general, for all n balls we have a total of nm choices. If you happen to have, say, 10 balls and 10 boxes, the number of ways you can place the balls in the boxes is quite large: ten to the power ten, or ten billion.

Using this result as the basis for computing probabilities, the computations turn out to fit rather well the experimental results. But if instead of molecules of gas we are dealing with other kinds of particles (e.g. photons), then the probabilistic results based on that way of counting distributions of balls in boxes doesn't agree with experiment. It turns out that for certain particles, the right way of counting such distributions is to consider the balls indistinguishable from each other. Just because it's a very beautiful piece of reasoning, I will tell you how to count the ways in which n indistinguishable balls can be placed into m different boxes. Clearly, the number must be smaller than the one we got before, because if we have, say, two balls and two boxes, now there is only one way of placing one ball in each box (the balls are indistinguishable!), whereas before there were two ways of doing that. Still, if you try to figure out the number of ways, the situation looks quite complicated (but try it, for trying hard is the one and only way to learn how to think!). The beautiful trick is to reverse the procedure: instead of placing balls into boxes, we are going to place boxes between balls, which, as you'll see, will be much easier. By "placing boxes between balls" I mean the following: imagine the balls all lying in a row; as for the boxes, there's no need to picture the whole box; it's enough to imagine the side walls, and it is those side walls we will place in between balls. Once we have placed those side walls in some way, we will have balls into boxes, again. So the question now becomes: in how many ways can we place side walls in between balls? We have m contiguous boxes, so we must have m+1 sides. Of those, two are fixed, at both ends of the row of balls; we can place the remaining m-1 sides as we please, each going in between two contiguous balls. In how many ways can we do that? There are m-1 sides and there are n-1 spaces between contiguous balls, so the whole thing amounts to choosing m-1 items from n-1 given ones. And to find out in how many ways we can do that, we only have to go to Pascal's triangle, to row number n-1, then go down along that row to the number in place m-1, and that will be our answer. Actually, this is the number of configurations where no box remains empty (if we want to include the possibility of empty boxes, we must do another simple trick, but I don't want to do any algebra here). For example, if we have six indistinguishable balls and four boxes, the number of different configurations with no empty boxes is 10. If the balls were distinguishable and boxes could be empty, we would have many more: 46 = 4,096. If the balls are distinguishable and no box can be empty, we would have 792. As you can see, counting is not such a simple task as they taught us in kindergarden!

We now move to a related but different subject, the bell curve, which is fundamental in probability and in statistics (the latter we'll cover in next lecture). I'll try to explain without much math why the bell curve is so important. Let us imagine a square wooden board with nails sticking out at regular intervals—regular rows and columns of nails sticking out. At the bottom of the board there's a ledge. Now I hold the board at some angle with the floor (say 45 degrees), and one by one I place tiny balls at the center point of the upper edge and let them fall. In its path, each ball may encounter a nail, in which case it will be deflected either toward the right or toward the left, and the probability of these two deflections is about the same, 1/2. Suppose we let many tiny balls fall that way, so that they accumulate on the ledge: what will be the shape of this accumulation? Answer: a bell curve. There will be more balls around the middle because we are placing the balls to start with at the center of the upper edge, and deflections tend to cancel each other, left ones with right ones. There are fewer balls at both ends of the ledge, for the balls in the right end, for example, are those who suffered many more right deflections than left ones, something which is unlikely although perfectly possible.

There is, of course, a math formula for the shape of this curve, but to write it down requires more math than I can teach you in one lecture (the motto of a teacher ought to be: "The truth and nothing but the truth, but not ALL the truth!") But we can get this curve in a different way, by using Pascal's triangle. Look at some row in this triangle, say row number 8: 1 8 29 56 70 56 29 8 1, and draw a bar graph of it. That means, draw an axis with the numbers 0, 1, 2, 3, ..., 8 labelled on it, and on top of those numbers draw a rectangle or bar with corresponding heights 1, 8, 29, 56, etc. This bar graph looks a lot like the bell curve. Now, the bar graph is not a continuous curve, it looks more like a staircase going up then down, but if we were to draw the bar graph for higher and higher rows of the Pascal's triangle, it would look more and more like the continuous bell curve. By the time you graph row number 200, say, you wouldn't be able to distinguish it from the bell curve with your unaided eye. This may look like a mystery, but it isn't. After all, we know that the number located in position m in row number n of Pascal's triangle is the number of ways in which we can choose m items out of n given items, and this corresponds exactly to the situation with the falling balls: each tiny ball will be deflected to the right by m nails out of n nails, and to the left by the other n-m nails, so the determination of where the ball will end up on the ledge is equivalent to the choosing of how many nails will deflect it to the right (the remaining nails will of course deflect it to the left). If there are more ways of choosing say 98 items out of 200 than there are of choosing just 7 items out of 200, this means that it's more likely for a ball to be deflected 98 times to the right and 102 to the left than for a ball to be deflected 7 times to the right and 193 to the left. That's why the graph of Pascal's triangle looks like the bell curve.

Going back to our board, balls and nails, the important thing to note here is that the final location of each ball on the ledge is the final result of many events, the ball hitting many nails. Each time the ball hits any nail, it has the same probability of being deflected right or left (we took this probability to be 1/2, but it doesn't have to be, it could be say 1/3 to the right and 2/3 to the left, for example). Further, what happens at one nail doesn't influence what happens at another nail: a ball may be deflected right by one nail, but when it later hits another nail, the probabilities of right and left are unchanged: this nail doesn't "remember" what happened at a previous nail. The technical word is: those nail-events are independent of each other. Whenever we have a situation like this, we will end up with a bell curve, and this is one of the most important results of Probability Theory. It has huge practical importance, for there are many phenomena in the natural sciences which follow this pattern, where the final result is the accumulation of many independent events deciding with constant probability between right and left, 0 and 1, etc. The height of human beings, the weight of beans of a given species, the size of cauliflowers, and so on. The total chance error in a lab measurement is computed on the basis of the bell curve. Grades in exams with many students and many questions are given according to the bell curve. If one measures human intelligence by the IQ, it also follows the bell curve, but whether or not the IQ is a good definition of human intelligence is something well beyond the scope of Probability Theory.

Required Reading:

Chapter 6 of Feynman's book.


To Nirenberg lectures   To Nirenberg bio.