
Ch. 20 Resources
Chapter 20: Testing Hypotheses About Proportions
In the previous chapter we learned the essentials of creating a confidence interval. Here we learn the essentials of performing a hypothesis test. The same basic procedure will be employed throughout the rest of the course.
Exit Polls
Exit polling from the 2004 general election found that among 2,178 voters in Washington state, 55% were female. The 2000 census found that 50.2% of Washington residents are female. It seems that females make up a bigger proportion of the voting population than the general population (in other words, they are more likely to vote than men). But is this really true, or did we just happen to get a biased sample in the exit poll?
We wish to test the claim that the proportion of females among Washington voters is greater than 50.2% (in other words, `p > 0.502`). This is hard to test, however, unless we have a fixed value of `p` with which to create a model. So we state a working hypothesis that `p = 0.502` (in other words, we assume that the proportion of the electorate who is female is the same as the proportion of the general population who is female) and try to show that this working hypothesis is not true. In the language of the formal hypothesis test:
H0: `p = 0.502` (this is the null hypothesis)
HA: `p > 0.502` (this is the alternative hypothesis)
Here `p` represents the proportion of the all Washington voters who are female. We don't know the true value of `p`, but for the time being we will operate under the assumption that `p = 0.502`.
Does a Binomial model apply here? Can we use a Normal model to approximate that Binomial model? We need to check assumptions and conditions.
Independence Assumption: Since a professional polling organization selected this sample, it is plausible that these 2,178 voters are independent of one another, although we really don't know.
Random Sampling Condition: Since a professional polling organization selected this sample, we hope that randomness was employed in some way, although we really don't know.
10% Condition: Certainly the 2,178 voters in the sample comprise less than 10% of all Washington voters in the 2004 general election.
Success/Failure Condition: In Chapter 19 we checked that `n hat p ge 10` and `n hat q ge 10` (since we didn't know `p` or `q`) but we are now hypothesizing that we know `p` (our null hypothesis is that `p = 0.502`) so we at least have a hypothesized population proportion (which we denote `p_0`) and, as in Chapter 18, we can check that:
`n p_0 = 2178(0.502) approx 1093 ge 10`
`n q_0 = 2178(0.498) approx 1085 ge 10`
so the success/failure condition is satisfied.
Which Normal model do we use? We are hypothesizing that `p = 0.502`, so `E( hat p ) = p_0 = 0.502` and:
`SD( hat p ) = sqrt((p_0 q_0)/(n)) = sqrt(((0.502)(0.498))/(2178)) approx 0.011`
so we will use the model N(0.502,0.011), shown here:
Notice that, like our computations in Chapter 18, we use `SD( hat p ) = sqrt((pq)/(n))`; this is different from the confidence intervals in Chapter 19, where we had to use `SD( hat p ) = sqrt((p_1 q_1)/(n))` since we didn't know the value of `p` when constructing a confidence interval.
So, given that `p = 0.502`, how likely is it that we would get a sample proportion of `hat p = 0.55` (or greater)? We see that `hat p = 0.55` is way off the right side of our Normal model diagram, so the probability should be close to 0, but to verify this we compute normalcdf(0.55,1E99,0.502,0.011) ≈ 0.000006, which means that getting such a sample proportion is very unlikely. We call this probability the P-value for this test. (Note that we use an uppercase P for the P-value and a lowercase p for the population proportion.)
There are two possibilities here: either something very unlikely has occurred or our working hypothesis that `p = 0.502` was not true to begin with. Keeping in mind that there is a small chance that we are making the wrong decision, we opt for the latter explanation. If in fact `p ne 0.502` then we should reject H0. That leaves us with HA as the only reasonable alternative.
We now state our conclusion: There is very strong evidence (P = 0.000006) to support the claim that the proportion of Washington voters who are female is greater than the percentage of females in the general population.
Notice that we don't say that we are absolutely, 100% sure that our claim is true. Notice also that we do include the P-value in our conclusion.
P-values from the calculator
The TI-84 offers a shortcut to find the P-value: press STAT, move the cursor right to TESTS, then down to 1-PropZTest and press ENTER. For p0, use 0.502, for x use 1198, (0.55×2178, rounded to the nearest integer to get the approximate number of successes; in some problems x may be given to us directly), for n use 2178 and for prop use >p0, (the form of inequality in the alternative hypothesis):
Now move the cursor to Calculate or Draw and press ENTER. If you choose Calculate you should see the P-value displayed, in addition to some other information:
Note that the P-value here is P = `3.6567 times 10^(-6) approx 0.000004`, which is very close to the P-value of 0.00006 we computed above (but not exactly equal due to rounding at various stages of the computation).
If you choose Draw you should see the P-value displayed along with the z-score of our sample proportion and a picture of the Normal model with the appropriate tail(s) shaded in:
This is a good way to check your answer for the P-value, but note that the TI-84 doesn't check assumptions and conditions nor does it perform any of the other steps in a formal hypothesis test for you. For now I will expect you to show your work as outlined above, although you are welcome to check your answer for the P-value using 1-PropZTest. The TI-84's P-value may vary slightly from the one we computed (due to rounding at various stages of the computations) but note that it's still very small and we reach the same conclusion.
Homework
Work the following exercises in Chapter 20: 1, 3, 9, 17, 19, 25, 33 and 35.
Errata
On page 509, just above the first displayed equation, the second part of the Success/Failure Condition is missing: it should also be noted that we expect `0.80 times 400 = 320` ingots to not crack.
On page 513, the "One-Proportion z-Test" box states that the conditions are the same as for the associated confidence interval, although this is not quite true: for the hypothesis test we check that `np_0 ge 10` and `nq_0 ge 10`, but with the confidence interval we checked that `n hat p ge 10` and `n hat q ge 10`.
On page 514, the next-to-last line in the "Mechanics" paragraph should read "occurs if the null" (rather than "occur").
For the two-sided test at the top of page 516, it should be noted that `P = 2 times 0.067 = 0.134`.
In the right column of the Show on page 522, the sentence above the Normal model picture should read "4.78 standard deviations above the mean" (not "3.68").
In the right column of the Tell on page 522, the last sentence should read "is greater than 50%" (rather than "is not 50%").
ActivStats
Work through the lessons on pages 20-1 and 20-2 in the ActivStats lesson book, as time permits.