Two-sample t-tests and t-intervals

For a science fair project, two second graders were assigned to build several paper airplanes and compare how far planes made with heavier paper flew when compare to planes made with lighter paper. The students built two planes (one with heavier paper, one with lighter paper) for each of three designs (Thunderbolt, PL-1 and Firefly). Each plane was flown five times and the distances recorded. The students' data appears in the following table:

model	distance	paper
Thunderbolt	17.0	heavy
Thunderbolt	16.0	heavy
Thunderbolt	14.5	heavy
Thunderbolt	12.0	heavy
Thunderbolt	12.7	heavy
PL-1	15.3	heavy
PL-1	11.8	heavy
PL-1	13.5	heavy
PL-1	18.7	heavy
PL-1	15.2	heavy
Firefly	8.5	heavy
Firefly	8.0	heavy
Firefly	8.5	heavy
Firefly	7.3	heavy
Firefly	6.8	heavy
Thunderbolt	48.8	light
Thunderbolt	30.7	light
Thunderbolt	27.6	light
Thunderbolt	16.3	light
Thunderbolt	24.4	light
PL-1	29.3	light
PL-1	18.5	light
PL-1	22.5	light
PL-1	25.9	light
PL-1	16.8	light
Firefly	12.0	light
Firefly	13.8	light
Firefly	17.6	light
Firefly	9.3	light
Firefly	9.4	light

The students hypothesized that the planes made with lighter paper fly farther, on average, than those made with heavier paper. Thus, the null and alternative hypotheses are:

H₀: µ_L = µ_H

H_A: µ_L > µ_H

Notice that in our null hypothesis we are assuming that µ_L = µ_H or in other words that µ_L−µ_H = 0; we wish to construct a model for the differences of all possible sample means (of the form `bar y_L - bar y_H`), so the center of this model is given by:

$\displaystyle{E}{\left({\overline{{y}}}_{{L}}-{\overline{{y}}}_{{H}}\right)}={E}{\left({\overline{{y}}}_{{L}}\right)}-{E}{\left({\overline{{y}}}_{{H}}\right)}=\mu_{{L}}-\mu_{{H}}={0}$

and the variance would be given by:

$\displaystyle{V}{a}{r}{\left({\overline{{y}}}_{{L}}-{\overline{{y}}}_{{H}}\right)}={V}{a}{r}{\left({\overline{{y}}}_{{L}}\right)}+{V}{a}{r}{\left({\overline{{y}}}_{{H}}\right)}=\frac{{{\sigma_{{L}}^{{2}}}}}{{{n}_{{L}}}}+\frac{{{\sigma_{{H}}^{{2}}}}}{{{n}_{{H}}}}$

and thus the standard deviation by:

$\displaystyle{S}{D}{\left({\overline{{y}}}_{{L}}-{\overline{{y}}}_{{H}}\right)}=\sqrt{{\frac{{{\sigma_{{L}}^{{2}}}}}{{{n}_{{L}}}}+\frac{{{\sigma_{{H}}^{{2}}}}}{{{n}_{{H}}}}}}$

The only problem is that we don't know σ_L or σ_H, so we must estimate the SD using:

$\displaystyle{S}{E}{\left({\overline{{y}}}_{{L}}-{\overline{{y}}}_{{H}}\right)}=\sqrt{{\frac{{{{s}_{{L}}^{{2}}}}}{{{n}_{{L}}}}+\frac{{{{s}_{{H}}^{{2}}}}}{{{n}_{{H}}}}}}$

We can compute s_L ≈ 10.3 and s_H ≈ 3.8 using the TI-84 (just enter the light distances in L1 and the heavy distances in L2, then use 1-VarStats L1 and 1-VarStats L2); at the same time we can compute the sample means: `bar y_L` ≈ 21.5 and `bar y_H` ≈ 12.4. So:

`SE = sqrt(10.3^2/15+3.8^2/15) approx 2.83`

Now, if a Normal model applies to the distances for the light planes and a Normal model applies to the distances for the heavy planes (we'll check both of these conditions shortly), then the sample means for both groups will have a Normal distribution. Furthermore, the difference of two Normally distributed random variables will be Normally distributed (this is a fact we will accept without proof), so the differences of all possible sample means (`bar y_L - bar y_H`) should follow a Normal distribution.

However, when we use SE to estimate SD, a Normal model no longer applies. We might suspect that for the differences a t-model would apply, as it did for a single sample mean, but unfortunately things are more complicated that that. It turns out that we can use a t-model in this situation as long as we use a special (highly non-intuitive) formula for the df:

df = $\displaystyle\frac{{{{\left(\frac{{{{s}_{{L}}^{{2}}}}}{{{n}_{{L}}}}+\frac{{{{s}_{{H}}^{{2}}}}}{{{n}_{{H}}}}\right)}}^{{2}}}}{{\frac{{{1}}}{{{n}_{{L}}-{1}}}{{\left(\frac{{{{s}_{{L}}^{{2}}}}}{{{n}_{{L}}}}\right)}}^{{2}}+\frac{{{1}}}{{{n}_{{H}}-{1}}}{{\left(\frac{{{{s}_{{H}}^{{2}}}}}{{{n}_{{H}}}}\right)}}^{{2}}}}$

If you spend the time plugging s_L ≈ 10.3, s_H ≈ 3.8, n_L = 15 and n_H = 15 into this ugly formula, you should get df ≈ 17.8. So our t-model is t(0,2.83) (because we're assuming that the difference of the sample means is 0, with a SE of 2.83) and we want to know the probability of observing a sample mean difference of at least 21.5−12.4 = 9.1 feet, which corresponds to a t-score of:

`t = (9.1-0)/2.83 approx 3.22`

We could then compute P = tcdf(3.22,1E99,17.8) ≈ 0.002. But it's much easier to just use 2-SampTTest on the TI-84 to compute the P-value more directly.

Once you have entered the light distances into L1 and the heavy distances into L2, just press STATS, move the cursor to TESTS, move down to 2-SampTTest... and press ENTER. Since the data is in your calculator, make sure Data is selected (if you don't have the full data set, select Stats and enter the summary statistics instead) then use L1 for List1, L2 for List2, keep both Freq1 and Freq2 set to 1, choose >µ2 for µ1: (since this is the form of our alternative hypothesis), select No for Pooled: and finally move the cursor down to Calculate and press ENTER. You should get a P-value of 0.0023, which agrees with our previous answer. (In general the P-value from to 2-SampTTest will be more accurate than the result we get "by hand" anyway, since in our computations above we rounded our intermediate results at several stages; the calculator also shows that df ≈ 17.8, as computed above.)

We still have some issues to address, however, as we have not yet checked any conditions. First we must check the usual conditions for a t-test, and we must do so for each of the two groups separately.

Independent trials: For each group, the 15 flights certainly constitute a small fraction of all possible flights and it is reasonable to believe the 15 flights are representative of all flights, even though there is no random selection involved. The one problem that we might encounter is that if the same planes were flown repeatedly we might be concerned that the planes would become disfigured as the number of flights increases, which would affect the distance that they travel. A stronger experimental design would have used many copies of the same plane design.

Normality: Let's look at boxplots comparing the heavy and light planes:

boxplots comparing the heavy and light planes

We notice that there is a major major outlier in the light group that violates the Normality condition for this group. This is a serious problem: we can't continue with the outlier included, since the 2-sample t-methods don't work when there is a significant outlier, but we don't have a legitimate reason to omit that one flight from our analysis (unless we learned that someone opened a door a let in a gust of wind for just that one flight, for example). One option is to conduct the analysis with the outlier omitted: if we conclude that the mean distance of the light planes is greater than the mean distance for the heavy planes, then we would certainly reach the same conclusion with the outlier included, since a high outlier in the light group only reinforces that the light mean is greater.

With the outlier temporarily omitted, the boxplots show:

boxplots comparing flight distances, with the outlier omitted

Now both groups appear to be roughly symmetric with no outliers. While it might also be a good idea to construct Normal probability plots for each group (or at least a back-to-back stem-and-leaf display of the data) it appears that it is now safe to proceed.

We must also check that the groups themselves are independent of each other:

Independent Groups: There is no reason to believe that the flight distances for the light planes are affected by the flights of the heavy planes.

Ideally, we would randomly assign cases to one of the treatment groups, which would make the assumption that the groups are independent reasonable.

So, if we omit the outlier, the conditions will be satisfied. Doing so, we can run 2-SampTTest again and get P = 0.0017.

Because the P-value is so small (and because we get roughly the same P-value with or without the outlier) we reject the null hypothesis. There is strong evidence (P = 0.002) that the mean flight distance of the planes made using light paper is greater than the mean flight distance of the planes made using heavy paper.

Confidence intervals
We could also use a confidence interval to estimate the magnitude of the difference between the mean distances for the two types of planes.

The conditions are the same as above, but we have the outstanding issue of the outlier, and here it's not so easy to work around. If we proceed with the outlier omitted, the margin of error is given by:

ME = `t^(text(*)) times SE(bar y_L - bar y_H)`

To compute t* for a 95% confidence interval, we would need to use invT(0.975,19.4) = 2.090, after using the ugly formula to compute df = 19.4. Using the formula for SE we get 2.16, so:

ME = 2.090 × 2.16 ≈ 4.5 feet

We add this to and subtract it from the observed difference between the sample means (7.2 feet) to get a confidence interval limits of 7.2−4.5 = 2.7 feet and 7.2+4.5 = 11.7 feet. We can express the confidence interval as:

2.7 < µ_L − µ_H < 11.7

We are 95% confident that the difference between the mean flight distance of the airplanes made with light paper is between 2.7 feet and 11.7 feet longer than the mean flight distance for the airplanes made with heavy paper.

In general, rather than performing these computations "by hand" we'll simply use 2-SampTInt on the calculator. The setup is remarkably similar to 2-SampTTest; you should check that this gives us a 95% confidence interval of:

2.7 < µ_L − µ_H < 11.7

Independent Groups
As you work the problems below, keep in mind that in order to use the techniques demonstrated above, the samples must be independent. If we sampled two groups of people, men and women, but the men in the first sample were brothers of the women in the second sample, then the samples would not be independent and we should not use a 2-sample t-test; we would need to use another method (such as a paired-data t-test) instead.

Exercises

1. Handwashing A 2009 WHO report noted that among studies related to health care workers washing their hands to prevent infection, one problem is "the duration of hand treatments[, which] require subjects to treat their hands with the hand hygiene product or a positive control for 30 seconds or 1 minute, despite the fact that the average duration of hand cleansing by HCWs has been observed to be less than 15 seconds in most studies." A registered nurse at a Seattle-area hospital, observed 21 health care workers in the Special Care Unit (SCU) and another 21 health care workers in the Intensive Care Unit (ICU) and recorded the number of seconds that each employee washed his or her hands before entering a treatment room. The times appear in the display below:

SCU ICU
76|1|
44|1|4
332222|1|22223333
10000|1|0111
98888|0|889
6|0|6677
|0|5 Key: 1|4 = 14 seconds

a) Is there evidence that the mean time spent washing hands by HCWs in the SCU is different from the mean time spent washing hands in the ICU at this hospital? Use a hypothesis test.

b) Construct a 95% confidence interval to estimate the difference between the mean amount of time HCWs at this hospital spend washing their hands before entering an SCU treatment room and the mean amount of time HCWs at this hospital spend washing their hands before entering an ICU treatment room.

Return to the Public Course Page