
Ch. 19 Resources
Chapter 19: Confidence Intervals for Proportions
This chapter involves creating confidence intervals for proportions. We will use similar procedures to create other types of confidence intervals throughout the rest of the course, so paying attention to details here will pay off later.
2004 presidential election
In the Chapter 18 Resources we looked at the results of the 2004 presidential election in Washington. In that situation we knew the population proportion, `p`, and we imagined what would happen if we drew random samples from that population and computed the proportion of people in the sample who voted for Bush. We denote such a sample proportion `hat p` and note that this proportion will vary from sample to sample. In fact, if we consider all possible samples of size `n`, we expect the possible values of `hat p` to follow a Normal model with mean `E(hat p) = p` and standard deviation `SD(hat p) = sqrt((pq)/(n))`; this Normal model will approximate the underlying Binomial model as long as the Success/Failure Condition (`np geq 10` and `nq ge 10`) is satisfied.
In the case of the 2004 Bush voters in Washington state, we expected the model to be N(0.4564,0.0498):
We note (from the 68-95-99.7 Rule) that about 95% of all sample proportions should be within 2 SDs of the mean; in this case we would expect about 95% of all samples of size 100 to fall between 0.356 and 0.556.
We can be more precise here, however, since the 68-95-99.7 Rule is just a rule of thumb. If we want to find the cutoff values for the "middle 95%" of the sample proportions, we note that this excludes the "most extreme 5%," in other words the 2.5% in the lower tail and the 2.5% in the upper tail. The cutoff values for these tails are given by invNorm(0.025) ≈ -1.96 and invNorm(0.975) ≈ 1.96, so we can now say that we expect 95% of the sample proportions to be within 1.96 SDs of the mean.
In the case of our Washington state Bush voters, we would expect 95% of all samples of size 100 to yield sample proportions between `0.4564-1.96 times 0.0498 approx 0.359` and `0.4564+1.96 times 0.0498 approx 0.554`.
Now we consider a situation where we do not know `p`, but only a single `hat p`.
Exit Polls
Exit polling from the 2004 general election showed that among 1242 female voters in Washington state, 708 voted for John Kerry. We know (more or less) what percentage of all Washington voters voted for Kerry (53%), but ballots are not counted according to gender, so if we want to know something about the proportion of female voters who voted for Kerry we need to rely on exit polls. In this case we have
`n = 1242`, `hat p_1 = (708)/(1242) approx 0.57` and `hat q_1 approx 0.43`
I use the subscript 1 here since this `hat p_1` is the one sample proportion we do know out of many different possible values of `hat p`.
We'd like to know `p` but since we don't, we want to say as much as we can about what `p` might be.
Does a Binomial model apply here? Can we use a Normal model to approximate that binomial model? We need to check some conditions.
Plausible Independence Condition: A professional exit polling organization sampled the 1242 female voters, so we have reason to believe that they are representative of all female voters in Washington. It is plausible that these 1242 women are independent of one another, although we really don't know for sure.
Randomization Condition: We would hope that a professional exit polling organization would employ randomization in their sampling procedure, although we don't know for sure.
10% Condition: Certainly the 1242 women in our sample are less than 10% of all female Washington voters in the 2004 general election.
Success/Failure Condition: We need to check that `np ge 10` and `nq ge 10` but here our problem is that we don't know `p` and `q`. We do, however, expect that `hat p` is reasonably close to `p` and thus `hat q` should be reasonably close to `q`, so we can instead check that:
`n hat p = (1242)(0.57) = 708 ge 10` and `n hat q = (1242)(0.43) = 534 ge 10`
so this condition is satisfied.
As in our previous example, we expect about 95% of all samples to yield sample proportions `hat p` that are within 1.96 SDs of `p`. The problem here is that we don't know `SD(hat p)` since we don't know `p` or `q`. We expect, however, that `hat p_1` (the one sample proportion we do know) is reasonably close to `p` and likewise that `hat q_1` is reasonably close to `q`, so in place of the standard deviation `SD(hat p)` we can use a reasonable estimate, which we call the standard error:
`SE(hat p) = sqrt((hat p_1 hat q_1)/(n)) = sqrt(((0.57)(0.43))/(1242)) approx 0.014`
We can now compute the margin of error:
`ME = 1.96 times SE(hat p) = 1.96 times 0.014 approx 0.028`
Here we use 1.96 because we want to trap 95% of all possible sample proportions; we call 95% the confidence level and call 1.96 the critical value for this confidence level, which in general we denote by z*. If in future problems we wish to use a different confidence level, we would need to recompute z*. Typical confidence levels are 90%, 95% and 99%.
We still don't know what `p` is, but we now know that 95% of all sample proportions from random samples of size 1242 should be within 0.028 of `p`. In particular, we are 95% confident that the one sample proportions we do know (`hat p_1 = 0.57`) is within 0.028 of `p`. In general this means that we are 95% confident that:
`hat p_1 - ME < p < hat p_1 + ME`
and in particular in this case we are 95% confident that
`0.57 - 0.028 < p < 0.57 + 0.028`
or:
`0.542 < p < 0.598`
We call (0.542,0.598) the 95% confidence interval for the true proportion of female Washington voters who voted for John Kerry in 2004.
We can conclude that we are 95% confident that the true proportion of female Washington voters who voted for John Kerry in 2004 is between 54% and 60%.
Checking Your Work
The TI-84 offers a shortcut to find a confidence interval: press STAT, move the cursor right to TESTS, then down to 1-PropZInt... and press ENTER. For x, use 708 (in general, the number of successes), for n use 1242 (in general, the sample size) and for C-Level use 0.95, then move the cursor to Calculate:
and then press ENTER. You should see the confidence interval displayed in the form (0.542, 0.598):
This is a good way to check your answers, but note that the TI-84 doesn't give you the margin of error directly, nor does it check assumptions and conditions. Unless otherwise specified, on an exam I expect you to be able to show your work, using normalcdf or invNorm to work with the Normal model as necessary; you may use 1-PropZInt to check your answer, but the calculator output from this feature is not sufficient to receive full credit.
Homework
Work the following exercises in Chapter 19: 3-11 odd, 17, 21, 23, 29, 31, 35 and 37.
Errata
The W's margin note on page 486 should read "infection status" for the What (not "percent infected").
On page 489, the second line under the section heading should read "95% of random samples" (not "95% of samples").
On page 503, the last line of comments in the TI-83/84 Plus instructions should read "multiply `n hat p` and round the result" (not "`np`").
ActivStats
Work through the lessons on pages 19-1 through 19-4 in the ActivStats lesson book, as time permits.