Comparing Two Proportions

We've compared the means from two groups, so we now compare proportions from two groups.

Exit Polls
Exit polling from the 2004 general election found that among 2,178 voters in Washington state, 55% were female. Exit polls in California found that among 2,390 California voters, 51% were female. Is this difference significant? In other words, is there evidence that a greater percentage of Washington voters are female than California voters?

We can try to answer this question in two ways: with a confidence interval or with a hypothesis test. Let's try a confidence interval first.

Confidence Interval
A confidence level of 95% seems reasonable; as we did with a confidence interval for one proportion, we need to check conditions. Let's start with the Washington sample:

Independent trials: While we don't know how the Washington voters are selected, it is reasonable to assume that a professional polling organization would have used some sort of random selection process and that the voters are independent of one another. Furthermore, the 2,178 voters certainly account for a small fraction of all Washington voters. It is reasonable to assume independent trials.

Normality: `n_W hat p_W = 2178(0.55) approx 1198 ge 10` and `n_W hat q_W = 2178(0.45) approx 980 ge 10`, so this condition is satisfied.

Now we must check the same conditions for the California voters:

Independent trials: It is reasonable to assume that a professional polling organization would have used some sort of random selection process and that the voters are independent of one another. Furthermore, the 2,390 voters certainly account for a small fraction of all California voters.

Normality: `n_C hat p_C = 2390(0.51) approx 1219 ge 10` and `n_C hat q_C = 2390(0.49) approx 1171 ge 10`.

Finally, we must check that the voters in each sample group are independent of one another:

Independent Groups: We have no plausible reason to expect that the Washington and California voters are in any way related, so this assumption seems reasonable.

We wish to estimate the difference between the proportion of voters who are female in each state: `p_W - p_C`. We know the difference between the sample proportions from these two samples:

`hat p_W - hatp_C = 0.55 - 0.51 = 0.04`

We know that the variance of the differences of the sample proportions, `Var(hat p_W - hat p_C)`, is the sum of the variances of the sample proportions:

`Var(hat p_W - hat p_C) = Var(hat p_W) + Var(hat p_C) = (p_W q_W)/(n_W) + (p_C q_C)/(n_C)`

so:

`SD(hat p_W - hat p_C) = sqrt((p_W q_W)/(n_W)+(p_C q_C)/(n_C))`

The only problem is that we don't know `p_W` or `p_C`. So we estimate the standard deviation by:

`SE(hat p_W - hat p_C) = sqrt((hat p_W hat q_W)/(n_W) + (hat p_C hat q_C)/(n_C))`

where SE stands for "standard error." In our present example:

`SE(hat p_W - hat p_C) = sqrt(((0.55)(0.45))/(2178) + ((0.51)(0.49))/(2390)) approx 0.0148`

For a 95% confidence level, z* = 1.96, so ME = (1.96)(0.0148) ≈ 0.029. Since `hat p_W - p_C = \0.04`, our confidence interval is given by:

`0.011 < p_W - p_C < 0.069`

What does this tell us? We are 95% confident that the proportion of Washington voters who are female is somewhere between 1% and 7% greater than the proportion of California voters who are female. We can also conclude that, because 0% is not in this confidence interval, it appears that `p_W - p_C ne 0`, which is equivalent to saying `p_W ne p_C`.

The computations here are not that complicated, but they can be time-consuming. For this reason, we typically use a calculator or computer to compute the confidence interval limits for a two-proportion confidence interval (and to compute P-values for a two-proportion hypothesis test, which we will look at soon). Do keep in mind that the calculator cannot check conditions, nor can it properly interpret the meaning of a confidence interval.

To find the confidence interval using the TI-84, press STAT, move the cursor right to TESTS, then down to 2-PropZInt... and press ENTER. For x1 use 1198, for n1 use 2178, for x2 use 1219, for n2 use 2390, for C-Level use 0.95:

2-PropZInt input for TI-84

Now move the cursor to Calculate and press ENTER. Check that you get the same results we did above when working "by hand":

output of 2-PropZInt on the TI-84

Hypothesis Tests
Now let's see what a hypothesis test will tell us. We wish to test the claim that the proportion of females among Washington voters is different from the proportion of females among California voters (in other words, `p_W ne p_C`):

H₀: `p_W = p_C`

H_A: `p_W ne p_C`

The conditions that we need to check are the same as for the confidence interval. Ordinarily for a hypothesis test, we would use the value of `p` in the null hypothesis for the success/failure condition, but in this case there is no hypothesized value. (We are supposing that `p_W = p_C`, but we aren't making any claim about what each of these proportions might be.) So as before we check:

`n_W hat p_W = 2178(0.55) approx 1198 ge 10`
`n_W hat q_W = 2178(0.45) approx 980 ge 10`
`n_C hat p_C = 2390(0.51) approx 1219 ge 10`
`n_C hat q_C = 2390(0.49) approx 1171 ge 10`

There is one difference from the confidence interval, however: since our null hypothesis states that `p_W = p_C`, we are operating under the assumption that the Washington and California proportions are the same. If these populations are in fact the same, then we can pool them together: out of a total of 2178 + 2390 = 4568 voters, 1198 + 1219 = 2417 are female, so:

`hat p_{po ol} = (2417)/(4568) approx 0.529`

which means that 52.9% of the combined sample are females. Thus our best estimate of the `SD(hat p_W - hat p_C)` is:

`SE(hat p_W - hat p_C) = sqrt((hat p_{po ol} hat q_{po ol})/(n_W) + (hat p_{po ol} hat q_{po ol})/(n_C)) = sqrt(((0.529)(0.471))/(2178) + ((0.529)(0.471))/(2390)) approx 0.0148`

In this example, pooling didn't really change the standard error, but in many cases it does. We can now use a Normal model for the differences in the sample proportions with mean 0 (since our hypothesis is that `p_W = p_C => p_W-p_C = 0`) and standard deviation 0.0148: N(0,0.0148).

What is the probability that we get a two sample proportions with a difference of at least 0.04? In other words, if `p_W = p_C`, we want to compute the probability that

`hat p_W - hat p_C > 0.04`

or that

`hat p_W - hat p_C < -0.04`

The first probability is given by normalcdf(0.04,1E99,0,0.0148) ≈ 0.003 and the second probability is also about 0.003, so the P-value is 0.006. Thus we reject H₀. We conclude that:

There is evidence (P = 0.006) to support the claim that the proportion of Washington voters who are female is different from the proportion of California voters who are female.

We could also have used a one-sided hypothesis test to test the claim that `p_W > p_C`.

The TI-84 offers a shortcut to find the P-value: press STAT, move the cursor right to TESTS, then down to 2-PropZTest and press ENTER. For x1 use 1198, for n1 use 2178, for x2 use 1219, for n2 use 2390, and for p1: use ≠p2 (the form of inequality in the alternative hypothesis):

input for 2-PropZTest on the TI-84

Now move the cursor to Calculate or Draw and press ENTER. You should see the P-value displayed, in addition to some other information:

output from 2-PropZTest on the TI-84 Normal model from the TI-84

The calculator automatically takes care of the pooling in the hypothesis test computations.

Exercises

1. Carpal tunnel syndrome On September 11, 2002, The New York Times published a brief report ("Study Finds Surgery Works Best For Carpal Tunnel Syndrome") that previewed an article in the Journal of the American Medical Association ("Splinting vs Surgery in the Treatment of Carpal Tunnel Syndrome") about a study that suggested "[s]urgery for carpal tunnel syndrome produces better long-term results in most patients than the more common treatment of putting a splint on the wrist." According to the Times, "researchers in the Netherlands [conducted a study in which] 176 patients had surgery or wore wrist splints for at least six weeks; they were then evaluated periodically. After three months, significant improvement was seen in 80 percent of surgery patients, compared with 54 percent of splint patients. At 18 months, the success rate remained significantly higher for the surgery group."

a) Using only information quoted above from the New York Times article, conduct a hypothesis test to evaluate whether surgery appears to be more effective that wrist splints in the treatment of carpal tunnel syndrome after after three months.

b) Follow the link to the full JAMA article and read the abstract on the first page. List any information in the New York Times article that was not accurate.

c) Redo the hypothesis test using information acquired from the JAMA article.

d) Using the JAMA article, construct a 95% confidence interval for the difference in the proportion of surgery patients who see improvement after three months and the proportion of wrist splint patients who see improvement after three months.

e) Based on the results of the hypothesis test and on the confidence interval, should surgery be the first choice of a carpal tunnel sufferer? Explain.

Return to the Public Course Page