
Basic Probability
As we progress through our study of statistics, we will move from simply describing the data we get from a randomly selected sample (through summary statistics or graphical displays) to comparing those results to an established model for the population in order to evaluate the reasonableness of that model. In so doing, if the sample data differs significantly from what we expect, we will need to ask "How likely are we to observe such sample data if the model is really true?" To quantify the degree of likelihood, we turn to probability. There is more than one way to define "probability" and, perhaps surprisingly, statisticians differ (to this day) about the best approach.
Dice
Many of the examples we consider will involve games (and gambling). There are two reasons for this: such games typically involve very strict rules, which make mathematical computations much easier; and the study of probability as a branch of mathematics first came into being when noblemen hire mathematicians to help give them an edge in their games of chance.
You are likely familiar with a standard six-sided die, a small cube with 1, 2, 3, 4, 5, or 6 dots on each side. We call a die "fair" if each side has an equal likelihood of turning up. In this situation, rolling a six-sided has 6 outcomes, each of which is equally likely, so we can define the probability of an event (such as rolling a 3 or rolling an odd number) as the ratio of favorable outcomes to possible outcomes: the probability of rolling a 3 is `1/6` and the probability of rolling an odd number is `3/6 = 1/2` (because three of the possible outcome are odd numbers).
We often write P(roll a 3) = `1/6` to stand for "the probability that you roll a 3 is equal to 1/6" where P(E) means the probability that event E occurs. (This is like function notation in algebra: the parentheses do not mean multiplying P times E in this situation.) We might also right P(3) instead of P(roll a 3) as long as the context is clear (that we're rolling a single six-sided die and looking to get a 3.)
In practice, even if a die is absolutely fair (and few dice truly are), we might roll the die 12 times (say) and not get any threes. Or we might get 4 threes instead of the 2 threes we'd expect to get. But if rolled the dice a million times, or a billion times, we would expect the percentage of rolls resulting in threes would eventually settle in on `1/6`, or 16.67%, as in this graph:
which shows the "running percentage" of threes after 1, 2, 3, 4,...10, 11, 12,...100, 101, 102,... etc. rolls. Notice that the horizontal scale is logarithmic (it jumps by powers of 10) and that the percentage doesn't settle down to be very close to 0.1667 until well after 1,000 rolls.
This is a demonstration of the Law of Large Numbers, which says that the observed percentage of favorable outcomes in a finite number of trials (n) will approach the theoretical probability of that outcome as n gets very large. The only problem is that this law doesn't define what "very large" means: it could be 1,000 or 1,000,000 or 1,000,000,000 or even more.
Coins
A game might also involve flipping a coin, which has two possible outcomes (heads and tails). If we assume these outcomes to be equally likely (they actually aren't), we have P(H) = 1/2 and P(T) = 1/2.
What if we flip a coin and roll a six-sided die. What's the probability of flipping heads and rolling a three? This event (coin = heads, die = three) involves two separate outcomes, and we know the probability of each: P(coin = heads) = 1/2 and P(die = three) = 1/6. How can we work with these probabilities for the constituent outcomes to compute the probability for the overall event? Let's visualize all of the possible (equally likely) outcomes for this event:
1 2 3 4 5 6
H X X X X X X
T X X X X X X
Of the 12 possible outcome, half of them (those highlighted above) involve the coin yielding heads. Of those, 1/6 of the outcomes with "coin = heads" involve the die rolling a three. So overall we have 1/2 or 1/6 of the total outcomes, or: P(coin = heads and die = three) = `1/2 times 1/6 = 1/12`, which agrees with our picture above, where 1/12 of the possible outcomes are favorable.
We can restate this as:
P(coin = heads and die = three) = P(coin = heads)×P(die = three)
or more generally as:
P(E and F) = P(E)×P(F)
which we call the multiplication rule. But, as we shall see in a moment, we have to be careful in the application of this rule.
Cards
Another common application of probability relates to a standard deck of 52 cards, which contains four suits (clubs, diamonds, hearts, spades) and 13 ranks in each suit (2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King, Ace):
If we shuffle the deck (to randomize it), and draw one care, P(heart) = `13/52 = 1/4`. If we replace that card, shuffle, and draw another, P(Ace) = `4/52 = 1/13`.
What if we want to draw two cards and want to know the probability of getting two hearts?
If, after drawing the first card, we replace it in the deck, shuffle, and redraw, then this probability is:
P(two hearts) = P(first = heart and second = heart) = `1/4 times 1/4 =1/16`
But if we don't replace the first card, then the probability of the second card being a heart is no longer 1/4, so we can't use the multiplication rule stated above: P(second = heart) depends on whether or not the first card was a heart (it's 12/51 if the first card is a heart, and 13/51 if the first card is not a heart, and neither of these numbers is equal to 1/4).
So in order to use the multiplication rule the two events must be independent. Sometimes it's very difficult to verify that two events are truly independent, so we must assume independence (if such an assumption is reasonable).
A or B
We can compute P(die = three or die = four) by noting that 2 of the 6 six-sided die outcomes involve rolling a three or a four, so P(die = three or die = four) =`2/6 = 1/3`. We can restate this as:
P(die = three or die = four) = P(die = three) + P(die = four)
or more generally as:
P(E or F) = P(E) + P(F)
We call this this addition rule, but we need to be careful about applying it as well. If instead we want to draw a single card from a shuffled deck, we might want to compute P(heart or Ace). A mindless application of the addition rule would yield:
P(heart or Ace) = P(heart) + P(Ace) = `13/52 + 4/52 = 17/52`
but a closer inspection of the cards reveals that while 13 outcomes involve a heart and 4 outcomes involve an Ace, one of those is a shared outcome: the Ace of hearts. In the computation above, we've doubl-counted this card. There are 13 hearts, plus 3 additional Aces that we haven't accounted for among the hearts, so:
P(heart or Ace) = `16/52 \ ne \ 17/52`
In order to apply the addition rule, we must check that the events are disjoint (or "mutually exclusive"): that is, the events share no outcomes.
We can modify our addition rule so that it applies in all situations by fixing the double-counting problem):
P(E or F) = P(E) + P(F) − P(both)
In our card example this yields:
P(heart or Ace) = P(heart) + P(Ace) − P(Ace of hearts) = `13/52 + 4/52 - 1/52 = 16/52`
Of course, many times it's simpler to count intuitively (like we did when we originally found the answer of 16/52) rather than using this more complicated formula.
Warning: In mathematics and statistics, "or" is not exclusive. When we ask someone "would you like cake or pie" we usually mean "would you like a slice of cake or woudl you like a slice of pie" but we're not giving them the option of answering "both." In probability, asking P(male or Democrat) means "the probability that somone is male or a Democrat" and includes the possibility that the person is a male Democrat.
More probability rules
Because a probability is defined as a percentage, we must have 0≤P(E)≤1 for any event E. An impossible event has probability 0 and a certain event has probability 1.
If we know that P(die = three) = 1/6,then we can say that P(not rolling a three) = 5/6. In general, P(not E) = 1−P(E).
Finally, if we can list all of the possible outcomes in a situation, and those outcomes are disjoint, then the sum of the probabilities of the individual outcomes must be 1. For example:
P(die = 1) + P(die = 2) + P(die = 3) + P(die = 4) + P(die = 5) + P(die = 6) = `1/6 +1/6 +1/6+ 1/6 + 1/6+1/6 = 1`
Exercises
1. Ohio voters ORC International conducted a phone survey of 1,020 Ohio registered voters during October 5–8, 2012, on behalf of CNN. Among those surveyed, 48% said they planned to vote for Barack Obama in the upcoming presidential election, with 45% for Mitt Romney, 3% for Libertarian Party candidate Gary Johnson and 1% of Green Party candidate Jill Stein, with the remainder saying they were undecided. For the purposes of this problem, assume these percentages apply to all Ohio voters
a) Compute the probability that a randomly selected voter:
i) does not plan to vote for Obama.
ii) is undecided.
iii) plans to vote for Obama or Romney.
b) Compute the probability that among two randomly selected voters:
i) both plan to vote for Romney.
ii) neither is undecided.
iii) at least one plans to vote for Obama.
c) Compute the probability that among 10 randomly selected voters, none are undecided.
d) If you want to interview a Romney supporter and randomly call Ohio voters, compute the probability that the first Romney supporter you reach is the third person you call.
2. Web browsers Wikipedia tracks information about its visitors (as do most Web sites), including the browser used during each visit. Out of more than 15 billion requests for Web pages by non-mobile browsers during September 2012, 25.36% used Google Chrome, 23.61% used Microsoft Internet Explorer, 19.28% used Firefox, 4.34% used Safari, 1.97% used Opera, and the rest used some other browser.
a) If we randomly select one of these page requests, what's the probability that it came from:
i) Chrome or Firefox?
ii) a browser other than Internet Explorer?
iii) a browser not included among the top five (Chrome, IE, Firefox, Safari, Opera)?
b) If we randomly select two of these page requests, what's the probability that:
i) both came from Chrome?
ii) neither came from Chrome?
iii) at least one of the two came from Chrome?
iv) exactly one of the two came from Chrome?
c) If we randomly select 10 of these page requests, what's the probability that none of them came from Internet Explorer?
d) If we randomly select 10 of these page requests, what's the probability that at least one of them came from Firefox?