Hypothesis Tests for Means

So far we've only dealt with confidence intervals and hypothesis tests about proportions (`p`). We now turn to problems that deal with means (`mu`). The basic format of the problems will look familiar, but a few of the details will change.

iPod capacity
An iPod is a portable audio player that allows you to store music from CDs in the form of MP3 files. I'm interested in purchasing an iPod, but different versions are available (each with a different capacity) and I want to know how big the iPod will need to be to hold my music collection.

I've figured out that if the average running time of all my CDs is about 1 hour, then I should be able to fit my entire collection on a 64 GB iPod. But if the average running time is more than an hour, I'll need to get a bigger iPod.

I don't really want to tabulate the running time for 700+ CDs, so from among the classical CDs in my collection, I randomly selected 14 CDs and recorded the total running time (in minutes) along with the name of the composer who wrote the music recorded on the CD.

composer	time
Barber	62.0
Berlioz	50.7
Brahms	74.1
Copland	51.7
Elgar	48.5
Grieg	72.3
Kernis	70.6
Mahler	57.1
Mozart	62.3
Poulenc	66.5
Rorem	69.6
Shostakovich	79.9
Strauss	72.1
Torke	53.5

Here are a histogram, boxplot and Normal probability plot of the running times.

histogram, boxplot and Normal probability plot of CD running times

We can compute the sample mean (`bar y_1` = 63.6 minutes) and sample standard deviation (`s_1` = 10.0 minutes) of the 14 CD times in our one sample using 1-VarStats on the TI-84.

I've only taken one sample (and computed one sample mean), but if I were to repeatedly take random samples of 14 CDs, according to the Central Limit Theorem I would expect them to vary with a standard deviation given by:

`SD(bar y) = (sigma)/(sqrt(n))`

Unfortunately, we don't know `sigma` (the standard deviation of the running times for all 700+ CDs). Our only option here is to instead use the estimate:

`SE(bar y) = (s_1)/(sqrt(n)) = (10.0)/(sqrt(14)) approx 2.67`

To investigate whether the average running time of all 700+ CDs exceeds one hour, we could use a hypothesis test with the following hypotheses:

H₀: µ = 60

H_A: µ > 60

Notice that these are very similar to the hypotheses for our one-proportion hypothesis tests, except we're now claiming something about a mean (μ).

If a Normal model applies to the sample means, I could compute the z-score of my observed sample mean:

`z = (63.6-60)/2.67 approx 1.348`

using the hypothesized value of μ for the expected value of the sample means and `SE(bar y) = 2.67` as an approximation of `SD(bar y)`. Using this, we could compute a P-value of:

normalcdf(1.348,1E99) ≈ 0.0888

Unfortunately, because we've approximated `SD(bar y)`, a Normal model no longer applies to the distribution of the sample means. Instead, we need to use a similar distribution, called the t-distribution, which looks similar to the Normal distribution, but with slightly "fatter" tails on each side. These tails change shape slightly as the sample size changes, so there are in fact many different t-distributions one for each sample size. For technical reasons, instead of specifying the sample size, we need to specify the number of degrees of freedom, given by `text(df) = n-1`. In our current example, df = 14−1 = 13. Otherwise, computing a P-value is similar to what we did above. We first compute a t-score:

`t = (63.6-60)/2.67 approx 1.35`

and then use tcdf on the calculator (in the DISTR menu below invNorm) to compute the P-value:

tcdf(1.348,1E99,13) ≈ 0.1003

We conclude that there is not enough evidence to support the claim that the average length for all 700+ CDs is 60 minutes. This does NOT tell us that the mean running time is exactly 60 minutes; in fact, it's possible that the mean is greater than 60 minutes, but we just don't have enough evidence to prove that beyond a reasonable doubt.

Conditions
Before we compute the P-value, we should have checked some conditions. For this t-test, those conditions are:

Independent trials: The trials will rarely be truly independent, but it's reasonable to assume independence if the CDs were randomly sampled (as they were here) and represent a small fraction of the population (`14/700 approx 2%` so we should be OK).

Normality: The Central Limit Theorem requires that we have a large sample size (n > 40 or so) or that the original population is roughly Normal. Here the sample size is fairly small, so we need to check Normality. Unfortunately, we don't have a graph of the entire population (the running times of all 700+ CDs) so our only option is to look at a graph of the 14 CDs we do have. The histogram isn't really unimodal and isn't very symmetric, and the Normal probability plot shows some bends and curves. This condition may not be satisfied.

Because the Normality condition may not be satisfied, we probably shouldn't have compute the P-value as we did above. This is a good reminder to always check conditions before you do computations.

t-tests on the TI-84
We can use the T-Test feature on the TI-84 to check compute the P-value more directly. Press STAT, then move the cursor right to TESTS, down to T-Test and press ENTER. If you have the running times of the CDs in your calculator already (say in L1) from when you computed the sample mean and sample SD, select Data; otherwise use Stats. Specify 60 for µ₀. If you've selected Data, make sure List is set to L1 (or wherever you put the data) and leave Freq set to 1; if you've selected Stats, use 63.6 for `bar x` and 10.0 for Sx. Finally, use >µ₀ (the form of the alternative hypothesis) for µ, move down to Calculate and press ENTER. You should see the P-value we computed above, along with some other information.

While the TI-84 is useful for computing the P-value, it doesn't check assumptions and conditions, nor does it properly interpret the P-value. On an exam, you can use T-Test to compute the P-value, but don't forget the other important components of a hypothesis test.

Exercises

1. Guessing ages A statistics student from Vietnam remembered a game show from that country called Guessing My Age. She decided to try something similar, asking an 18-year-old friend who still lives in Vietnam to send her a picture and then asking 50 Edmonds Community College students to guess the age of the person in the picture. The results are shown below:

2|6
2|44445
2|22233333
2 000011111111
1|888888899999
1|666777777
1|555 Key: 2|6 means 26 years old

If appropriate, use a hypothesis test to test the claim that, on average, people overestimate the friend's age.

2. Handwashing A 2009 WHO report noted that among studies related to health care workers washing their hands to prevent infection, one problem is "the duration of hand treatments[, which] require subjects to treat their hands with the hand hygiene product or a positive control for 30 seconds or 1 minute, despite the fact that the average duration of hand cleansing by HCWs has been observed to be less than 15 seconds in most studies." A registered nurse at a Seattle-area hospital, observed 21 health care workers in the Special Care Unit (SCU) and recorded the number of seconds that each employee washed his or her hands before entering a treatment room. The times appear in the display below:

1|67
1|44
1|222233
1|00001
0|88889
0|6 Key: 1|7 = 17 seconds

Does the data gathered by the RN support the claim in the WHO report that health care workers wash their hands, on average, for less than 15 seconds?

Return to the Public Course Page