Ch. 24 Resources

Chapter 24: Comparing Means

In Chapter 24 we move from confidence intervals and hypothesis tests about the mean of single population to confidence intervals and hypothesis test about the means of two populations.

Most of the steps here will be familiar: we now know how to deal with inference for a mean from a single population and we've previously encountered two-sample methods in the context of the two-proportion confidence interval and hypothesis tests in Chapter 22. As in that chapter, we will generally use the TI-84 to perform the mechanics of the confidence interval computations and hypothesis tests, since the formulas are a bit more time-consuming.

Paper airplanes

For a science fair project, two second graders were assigned to build several paper airplanes and compare how far planes made with heavier paper flew when compare to planes made with lighter paper. The students built two planes (one with heavier paper, one with lighter paper) for each of three designs (Thunderbolt, PL-1 and Firefly). Each plane was flown five times and the distances recorded. The students' data is shown in the following table:

model distance paper
Thunderbolt 17.0 heavy
Thunderbolt 16.0 heavy
Thunderbolt 14.5 heavy
Thunderbolt 12.0 heavy
Thunderbolt 12.7 heavy
PL-1 15.3 heavy
PL-1 11.8 heavy
PL-1 13.5 heavy
PL-1 18.7 heavy
PL-1 15.2 heavy
Firefly 8.5 heavy
Firefly 8.0 heavy
Firefly 8.5 heavy
Firefly 7.3 heavy
Firefly 6.8 heavy
Thunderbolt 48.8 light
Thunderbolt 30.7 light
Thunderbolt 27.6 light
Thunderbolt 16.3 light
Thunderbolt 24.4 light
PL-1 29.3 light
PL-1 18.5 light
PL-1 22.5 light
PL-1 25.9 light
PL-1 16.8 light
Firefly 12.0 light
Firefly 13.8 light
Firefly 17.6 light
Firefly 9.3 light
Firefly 9.4 light

The students hypothesize that the planes made with lighter paper fly farther, on average, than those made with heavier paper. Thus, the hypotheses are:

H0: µL = µH

HA: µL > µH

Before proceeding with any computations, we need to check conditions; as with the two-proportion tests, we must do so for each group separately, plus we need to check the Independent Groups Assumption.

Randomness and 10% conditions: For each group, the 15 flights certainly constitute less than 10% of all possible flights and it is reasonable to believe the 15 flights are representative of all flights; even though there is not random selection involved, we use the 10% and randomness conditions to check the independence assumption, and in the case of an experiment like this that can be repeated endlessly it's enough to believe that the trials are truly independent. The one problem that we might encounter is that the same planes were flown repeatedly and we might be concerned that the planes would become disfigured as the number of flights increases, which would affect the distance that they travel. A stronger experimental design would have used many copies of the same plane design.

Nearly Normal condition: Let's look at boxplots comparing the heavy and light planes:

boxplots comparing the heavy and light planes

We notice that there is a major major outlier in the light group that violates the Nearly Normal condition for this group. This is a serious problem: we can't continue with the outlier included, since the 2-sample t-methods don't work when there is a significant outlier, but we don't have a legitimate reason to omit that one flight from our analysis. One option is to conduct the analysis with the outlier omitted: if we conclude that the mean distance of the light planes is greater than the mean distance for the heavy planes, then we would certainly reach the same conclusion with the outlier included, since a high outlier in the light group only reinforces that the light mean is greater.

With the outlier temporarily omitted, the boxplots show:

boxplots comparing flight distances, with the outlier omitted

Now both groups appear to be roughly symmetric with no outliers. While it might also be a good idea to construct Normal probability plots for each group (or at least a back-to-back stem-and-leaf display of the data) it appears that it is now safe to proceed.

Independent Groups assumption: We don't expect that the flight distances for the light planes are affected by the flights of the heavy planes.

Notice that in our null hypothesis we are assuming that µL = µH or in other words that µL−µH = 0; we wish to construct a model for the differences of all possible sample means, so the center of this model is given by:

`E(bar y_L - bar y_H) = E(bar y_L) - E(bar y_H) = mu_L - mu_H = 0`

The variance would be given by:

`Var(bar y_L - bar y_H) = Var(bar y_L) + Var(bar y_H) = (sigma_L^2)/(n_L) + (sigma_H^2)/(n_H)`

and thus the standard deviation by:

`SD(bar y_L - bar y_H) = sqrt( (sigma_L^2)/(n_L) + (sigma_H^2)/(n_H) )`

The only problem is that we don't know σL or σH, so we must estimate the SD by:

`SE(bar y_L - bar y_H) = sqrt( (s_L^2)/(n_L) + (s_H^2)/(n_H) )`

We can compute sL ≈ 7.2 and sH ≈ 3.8 using the TI-84 (just enter the light distances in L1, the heavy distances in L2, then use 1-VarStats L1 and 1-VarStats L2) or Data Desk; at the same time we can compute the sample means: yL ≈ 19.6 and yH ≈ 12.4. So:

`SE(bar y_L - bar y_H) = sqrt(((7.2)^2)/(14) + ((3.8)^2)/(15)) approx \ 2.16`

So our t-model is t(0,2.16) and we want to know the probability of observing a sample mean difference of at least 19.6−12.4 = 7.2 feet, which corresponds to a t-score of:

`t = (7.2-0)/(2.16) approx 3.33`

We could at this stage compute the P-value on the TI-84 using tcdf, but we need to know the number of degrees of freedom in our t-model. This is not nearly as easy as it was in the previous chapter; unfortunately to continue our computations "by hand" we have to use the formula in the footnote at the bottom of page 619:

df = `(((s_L^2)/(n_L) + (s_H^2)/(n_H))^2)/((1)/(n_L - 1) ((s_L^2)/(n_L))^2 + (1)/(n_H - 1) ((s_H^2)/(n_H))^2)`

If you spend the time plugging sL ≈ 7.2, sH ≈ 3.8, nL = 14 and nH = 15 into this ugly formula, you should get df ≈ 19.4, so we can then compute P = tcdf(3.33,1E99,19.4) ≈ 0.0017. But it's not intuitive why this formula is the correct one (and it's beyond the scope of this course to prove that it is in fact the right formula) so, as noted above, let's just use 2-SampTTest on the TI-84 to compute the P-value more directly.

Once you have entered the light distances into L1 and the heavy distances into L2 (and have check the conditions) just press STATS, move the cursor to TESTS, move down to 2-SampTTest... and press ENTER. Since the data is in your calculator, make sure Data is selected (if you don't have the full data set, select Stats and enter the summary statistics instead) then use L1 for List1, L2 for List2, keep both Freq1 and Freq2 set to 1, choose >µ2 for µ1: (since this is the form of our alternative hypothesis), select No for Pooled: and finally move the cursor down to Calculate and press ENTER. You should get a P-value of 0.0018, which agrees fairly closely with our previous answer. (In general the P-value from to 2-SampTTest will be more accurate than the result we get "by hand" since in our computations above we rounded our intermediate results at several stages; the calculator also shows that df ≈ 19.4, as computed above.)

Since the P-value is so small, we reject the null hypothesis. There is strong evidence (P = 0.0018) that the mean flight distance of the planes made using light paper is greater than the mean flight distance of the planes made using heavy paper. We did omit a high outlier, but since that would only strengthen the evidence that the light paper airplanes fly farther, on average, our conclusion stands.

Confidence intervals

We could also use a confidence interval to estimate the magnitude of the difference between the mean distances for the two types of planes. The margin of error is given by:

ME = `t_{19.4}^(text(*)) times SE(bar y_L - bar y_H)` 

Unfortunately df = 19.4 is not listed in Table T of the textbook, but we could use df = 19 as a reasonable estimate; for a 95% confidence interval the critical value is 2.093, so

ME = 2.093 × 2.16 ≈ 4.5 feet

We add this to and subtract it from the observed difference between the sample means (7.2 feet) to get a confidence interval limits of 7.2−4.5 = 2.7 feet and 7.2+4.5 = 11.7 feet. We can express the confidence interval as:

2.7 < µL − µH < 11.7

We are 95% confident that the difference between the mean flight distance of the airplanes made with light paper is between 2.7 feet and 11.7 feet longer than the mean flight distance for the airplanes made with heavy paper.

If you have a TI-84 (as opposed to a TI-83) you can compute the critical value more precisely as invT(0.975,19.4) ≈ 2.090, but this won't change our answer significantly.

In general, rather than performing these computations "by hand" we'll simply use 2-SampTInt on the calculator. The setup is remarkably similar to 2-SampTTest; you should check that this gives us a 95% confidence interval of:

2.7 < µL − µH < 11.7

Sometimes there may be a small discrepancy in the confidence interval limits is due to rounding (the calculator answer will be more precise), but in this case the TI-84 result matches the one we got from our "by hand" computations.

Pooling

When dealing with 2-proportion hypothesis tests we pooled our sample data to compute the standard error (although we did not do this when computing the confidence interval limits). There are rare cases where it is advisable to pool sample data to compute the standard error for a 2-sample t-test, but (as mentioned in the text) determining whether or not pooling is allowed requires yet another test, and even when we can pool it rarely affects the P-value in a significant way. Thus, in this course we will never pool sample data for a 2-sample t-test or 2-sample t-interval. All of this is discussed further on pages 630 and 631, but if you're pressed for time you can just skip reading these two pages and remember never to pool in this situation.

Independent Groups

As you work the problems in this chapter, keep in mind that in order to use the techniques demonstrated above, the samples must be independent. If we sampled two groups of people, men and women, but the men in the first sample were brothers of the women in the second sample, then the samples would not be independent and we should not use a 2-sample t-test; we will learn what to do about paired data in Chapter 25.

Homework

Work the following exercises in Chapter 24: 7, 9, 13–19 odd, 25, 31 and 37.

Errata

The For Example on page 621 should read "Table T gives" (not "Table gives").

The conclusion on page 624 should read "captures the true difference between the mean duration of brand-name batteries and the mean duration of generic batteries.

In the For Example on page 634, the second line of equations should read: `8.13sqrt((1)/(27)+(1)/(27))`  (not 5.09).

The answer in the Student Solutions Manual to part d of Exercise 9 is wrong.

The stem-and-leaf display in Exercise 17 is missing a key.

Exercise 28 should refer to Exercise 27 (not Exercise 19).

ActivStats

Work through the lessons on pages 24-1 and 24-2 in the ActivStats lesson book, as time permits; you can tread lightly over the discussion of pooled t-tests on page 24-3.

Additional Resources

Hypothesis Test: Two Means, Independent Samples
A flash tutorial on using the TI-83's 2-SampTTest feature to perform a hypothesis test about means from two independent populations.
Two-sample t-test and paired-data t-test
A Web-based computational tool from Graphpad Software.