Confidence Interval Odds and Ends

We will employ the essential concepts related to the one-proportion confidence intervals throughout the rest of the course.

Confidence levels
So far we have only constructed confidence interval with a 95% confidence level. In general, the margin of error formula for a one-proportion confidence interval is:

`ME = z text(*) times sqrt((hat(p) hat(q))/n)`

where `z text(*)` is the z-score associated with the confidence level for the interval we're constructing. In a 95% confidence interval we expect that 95% of all sample proportions fall within 1.96 SDs of the true proportion. We computed `z text(*) = 1.96` by noting that the central 95% of a Normal model cuts off the bottom 2.5% and top 2.5%:

We used invNorm(0.025) = -1.96 or invNorm(0.975) = 1.96 to find `z text(*) = 1.96`.

If we decide we need a stronger confidence level, say 99%, we can note that the central 99% cuts off the bottom 0.5% and top 0.5%, so we can compute invNorm(0.005) = -2.576 or invNorm(0.995) = 2.576 to find that `z text(*) = 2.576` for a 99% confidence level:

The most common confidence levels a 90%, 95% and 99%.

Sample size
Sometimes we want to ensure before randomly sampling from the population that we will end up with a margin of error no bigger than some value.

The public relations firm Strategies 360 randomly sampled 500 likely voters in the state of Washington during the period October 17–20, 2012, asking them whether they thought "that things are heading in the right direction or are things off on the wrong track" in the state. Of those surveyed, 41% thought things are heading in the right direction. The margin of error for a 95% confidence interval for this survey is:

`ME = 1.96 times sqrt((0.41 times 0.59)/500) approx 0.043`

If we wanted to construct a more precise 95% confidence interval with a 3% margin of error we would need to survey more people. But how many more?

`ME = z text(*) times sqrt((hat(p) hat(q))/n) => (ME)/(z text(*)) = sqrt((hat(p) hat(q))/n) => ((ME)/(z text(*)))^2 = (hat (p) hat(q))/n => ((z text(*))/(ME))^2 = n/(hat(p) hat(q)) => n = ((z text(*))/(ME))^2 hat(p) hat(q)`

We can use this last formula to approximate the sample size. We know that `z text(*) = 1.96`  (because we're using a 95% confidence level) and that `ME = 0.03` (because we want a 3% margin of error). But we don't know what `hat p` will be for our new sample, because we haven't actually carried out the new survey; however, we expect that `hat(p) approx 0.41` based on the previous survey, so we'll use that for our computation here:

`n = (1.96/0.03)^2 (0.41)(0.59) approx 1032.54`

so we'll sample 1,033 people just to be safe.

What if we didn't have information form a recent survey to estimate `hat p`? In that situation, we'd want to make the sample size as big as possible. It turns out this will happen when `hat(p) = hat(q) = 0.5` (you can show this is true using a bit of algebra).

Conditions are important!
Before we compute a margin of error, we need to check several conditions:

Two outcomes: If we don't have two outcomes (yes/no, success/failure) we can't construct a one-proportion confidence interval; we need to use a more sophisticated mathematical technique.

Independent trials: Usually the trials are not truly independent, but if we have gathered the data via a simple random sample (SRS), then it's reasonably safe to assume the trials are independent. If the sample was not selected randomly. Without a random sample, we cannot proceed with the confidence interval. (In some instances, it may not be clear whether or not the sample was random; in those situations we can clearly state that we're assuming that the sample is random and proceed, so that if we later learn the sample was not randomly selected, we can invalidate the results of our confidence interval.)

Constant probability of success: Usually the probability of success is not constant, but as long as we're selecting a relatively small sample from a relatively large population, the probability of success will be nearly constant and we can proceed. (Usually our sample represents far less than 1% of the population, so this is not a problem; some books use 10% as a threshold, but 1% is safer.)

Normal approximation: We need to check that `n hat(p) ge 10` and `n hat(q) ge 10`. If one or more of these inequalities does not hold, we need to use more sophisticated methods to construct a confidence interval. 

Exercises

1. Compute the value of `z text(*)` for a confidence level of:

a) 90%

b) 98%

2. Compute the sample size for a 95% confidence interval assuming you want the margin of error to be no more than 3% and you have no prior information about the topic being investigated.

3. Fairleigh Dickinson University’s PublicMind project conducted a telephone telephone survey of New Jersey residents between April 30 and May 6, 2012, finding that among 797 registered voters, 56% approved of the way NJ Gov. Chris Christie was handling his job.

a) Construct a 95% confidence interval for the proportion of all NJ residents who approved of Christie during May 2012.

b) How large of a sample size would you need in order to construct a 95% confidence interval with a margin of error no bigger than 2.5%, if you were to conduct a new survey?

c) For the original survey, if you constructed a 90% confidence interval would it be wider or narrower than the 95% interval you constructed in part a?

In the remaining exercises, construct a confidence interval as indicated, or explain why you should not construct the interval.

4. The Web site for MSNBC's The Ed Show asked visitors to respond to the question, "Was President Obama’s re-election a victory for the middle class?" Of 4,686, people who participated, 97.31% responded "yes," while the rest responded "no." Construct a 95% confidence interval for the proportion of all who Americans think Obama’s re-election was a victory for the middle class?

5. On November 11, 2012, The Herald reported that 73% of 1,305 students who graduated high school from the Edmonds School District during 2010 had gone on to attend college. Construct a 95% confidence interval for the proportion of all 2010 high school graduates from the Edmonds School district who went on to attend college.