
Ch. 15 Resources
Chapter 15: Probability Rules!
As in the previous chapter, the computations in Chapter 15 are fairly basic; you've already been doing very similar calculations since Chapter 3. Again, be sure to read the chapter carefully, and read the problems carefully. Make pictures (e.g. Venn diagrams or tree diagrams) whenever you can, create tables listing the probabilities mentioned in a problem (if one isn't already given to you), reserve plenty of time, and be sure to work as many problems as possible.
Since there are no specialized calculator or Data Desk commands, we'll again work through a few brief examples here.
Web browsers (again)
Summary statistics for browser usage during December 2006 at the Web site http://www.w3schools.com/browsers/browsers_stats.asp were used in an example in the Chapter 14 Resources:
browser | relative frequency |
IE7 | 10.7% |
IE6 | 45.3% |
IE5 | 3.4% |
Firefox | 30.3% |
other | 10.3% |
The Web page cited above also includes summary statistics for the operating systems used by its visitors:
OS | relative frequency |
Windows XP | 74.9% |
Windows 2000 | 8.0% |
Linux | 3.3% |
Mac | 3.5% |
other | 10.3% |
Let's try to answer a few relevant questions.
What's the probability that a randomly selected visit to this Web site used Firefox or XP?
If we (mindlessly) apply the Addition Rule, we would get:
`P(mbox{Firefox or XP}) = P(mbox{Firefox}) + P(mbox{XP}) = 30.3% + 74.9% = 105.2%`
but this is obviously wrong because we end up with a probability greater than 100% and we know that `0 leq P(E) leq 1` must hold for any event E. What went wrong here? Many (in fact, most) Firefox visits were with the XP OS, so when we added the percentage of Firefox visits to the percentage of XP visits, we actually counted the visits that used both Firefox and XP twice. In order to solve this problem, we need to know the probability of selecting a visit that involved both Firefox XP. With that information we could then compute:
`P(mbox{Firefox or XP}) = P(mbox{Firefox}) + P(mbox{XP}) - P(mbox{both Firefox and XP})`
and get the answer to the question at hand. Unfortunately, this information we need is not included in the data from the Web site mentioned above, so we're unable to answer the question.
What's the probability that a randomly selected visit involved both IE6 and a Mac OS?
If we (mindlessly) apply the Multiplication Rule here we would get:
`P(mbox{IE6 and Mac}) = P(mbox{IE6}) times P(mbox{Mac}) = (0.453)(0.035) approx 0.016 = 1.6%`
At first glance, this might seem to make sense: if 45.3% of visits involve IE6 and 3.5% of visits involve a Mac OS, then we might think that 3.5% of the 45.3% that involve IE6 therefore involve both. The problem is that we don't know if browser and OS are independent. Therefore we should be employing the General Multiplcation Rule here:
`P(mbox{IE6 and Mac}) = P(mbox{IE6}) times P(mbox{Mac|IE6})`
While the conditional probability `P(mbox{Mac|IE6})` is something we can't find or compute given only the summary statistics for the Web site of interest, it happens to be a fact that there is no IE6 for the Mac: Microsoft stopped making versions of their browser for the Mac OS after IE5.5. Thus:
`P(mbox{Mac|IE6}) = 0`
since the probability that someone is using a Mac, given that their browser is IE6, is 0. Hence:
`P(mbox{IE6 and Mac}) = P(mbox{IE6}) times P(mbox{Mac|IE6}) = (0.453)(0) = 0`
Furthermore, since:
`P(mbox{Mac|IE6}) = 0 ne 0.035 = P(mbox{Mac})`
we can conclude that browser and OS are not independent.
Now let's look at some problems where we do have enough information to answer some relevant probability questions.
Coke vs. Pepsi
Consider the data gathered from a survey of 41 Statistics students that was discussed in the Chapter 3 Resources:
female | male | |
Coke | 7 | 9 |
Pepsi | 10 | 4 |
neither | 8 | 3 |
Let's revisit questions similar to those we asked in Chapter 3, but now phrased in terms of probability.
If we randomly select one student, what's the probability that he or she prefers Coke?
`P(mbox{Coke}) = (16)/(41) approx 39%`
If we randomly select one student, what's the probability that she is female?
`P(mbox{female} ) = (25)/(41) approx 61%`
If we randomly select one student, what's the probability that she is female or prefers Coke?
There are two ways to do this. The first is to simply count up all of the people who satisfy one (or both) of the criteria and compute the probability directly:
`P(mbox{female or Coke} ) = (7+10+8+9)/(41) = (34)/(41) approx 83%`
Another way is to use the General Addition Rule:
`P(mbox{female or Coke} ) = P(mbox{female}) + P(mbox{Coke}) - P(mbox{both female and Coke}) = (25)/(41) + (16)/(41) - (7)/(41) = (34)/(41) approx 83%`
In most cases (including this one), I prefer the first method.
If we randomly select one student, what's the probability that she is female and prefers Coke?
The most direct way is simply to notice that 7 of the 41 students fit both criteria:
`P(mbox{both female and Coke}) = (7)/(41) approx 17%`
The (needlessly) complicated way would be to use the General Multiplication Rule:
`P(mbox{both female and Coke}) = P(mbox{female}) times P(mbox{Coke|female}) = (25)/(41) times (7)/(25) = (7)/(41) approx 17%`
If we randomly select one student, what's the probability that she prefers Coke, given that she is female?
We actually computed this as part of the previous solution:
`P(mbox{Coke|female}) = (7)/(25) approx 28%`
If we randomly select one student, what's the probability that she is female, given that she prefers Coke?
Notice that this is not the same as the previous question:
`P(mbox{female|Coke}) = (7)/(16) approx 44%`
Are being female and preferring Coke disjoint?
No, since
`P(mbox{both female and Coke}) = (7)/(41) approx 17% ne 0`
Are being gender and beverage independent?
To answer this we compare:
`P(mbox{female|Coke}) = (7)/(16) approx 44% ne 61% approx (25)/(41) = P(mbox{female})`
If gender and beverage preference were independent, we would expect `P(mbox{female|Coke}) = P(mbox{female})` to hold; it doesn't, so we conclude that gender and beverage preference are not independent. We could also have compared `P(mbox{male|Coke})` with `P(mbox{male})`, or `P(mbox{Pepsi|female})` with `P(mbox{Pepsi})```, etc. to make this assessment.
Note that we answered this last question once before, back in the Chapter 3 Resources, but there we relied upon comparing pie charts. Here we have a way to answer the question with computations rather than "eyeballing" but we still might encounter a problem if the two probabilities we are comparing are close but not exactly equal; in such cases we might decide that the evidence is inconclusive, at least until we develop more sophisticated techniques to these questions (in Chapter 26).
Coke vs. Pepsi rematch
Suppose we had been given relative frequencies (or percentages) instead of counts for the Coke vs. Pepsi data:
female | male | total | |
Coke | 17% | 22% | 39% |
Pepsi | 24% | 10% | 34% |
neither | 20% | 7% | 27% |
total | 61% | 39% | 100% |
Note that these are all table percentages.
If we randomly select one student, what's the probability that he or she prefers Coke?
Here we can just read from the table that `P(mbox{Coke}) = 39%`
If we randomly select one student, what's the probability that she is female?
Again, we can read from that table that `P(mbox{female}) = 61%`.
If we randomly select one student, what's the probability that she is female or prefers Coke?
Again, there are two ways to do this. The first is to simply add the probabilities for all of the cells in the female column and the Coke row, being careful not to count any cell twice:
`P(mbox{female or Coke}) = 17% + 24% + 20% + 22% = 83%`
Another way is to use the General Addition Rule:
`P(mbox{female or Coke}) = P(mbox{female}) + P(mbox{Coke}) - P(mbox{both female and Coke}) = 61% + 39% - 17% = (34)/(41) approx 83%`
Note that we were able to read all of the necessary probabilities from the table.
If we randomly select one student, what's the probability that she is female and prefers Coke?
The most direct way is simply to read this from the table:
`P(mbox{both female and Coke}) = 17%`
If we randomly select one student, what's the probability that she prefers Coke, given that she is female?
`P(mbox{Coke|female}) = (P(mbox{both Coke and female}))/(P(mbox{female})) = (0.17)/(0.61) approx 28%`
We can also answer the questions about disjointness and independence using the relative frequencies rather than counts.
Homework
Work the following exercises in Chapter 15: 1, 5–11 odd, 17, 21, 25, 29, 41 and 45.
Errata
On page 398, the third line of the Reversing the Conditioning section should read `P(mbox{accident|binge})` (not `P(mbox{accident binge})`).
On page 400, the last sentence in the Show should read "having TB and a positive test" (not "given").
The answer in the back of the book (and in the SSM) for part a of Exercise 29 is wrong; it should be 95.6%.
Part a of Exercise 34 should read "may not be independent of" (rather than "may depend upon").
ActivStats
Work the activities on pages 15-1 through 15-3 in the ActivStats lesson book, as time permits.
Additional Resources
- What Is Probability?
- Episode 15 from Against All Odds features a discussion of some of these same topics, although some of the terminology used may be different.
- Carnegie Mellon: Introduction to Statistics
- Modules 6 and 7 of Carnegie Mellon's open source Introduction to Statistics course cover many of the same ideas.
- Sofia: Elementary Statistics
- Lessons 3.1, 3.2 and 3.3 of the Sofia Open Content Initiative's Elementary Statistics course include a discussion of probability terminology and rules.
- Montana Outlook Poll
- Data set cited in Exercise 31.