
Hypothesis Test Odds and Ends
We will employ the essential concepts related to the one-proportion hypothesis test throughout the rest of the course.
Significance level
Our general rule of thumb when deciding whether we should reject the null hypothesis or fail to reject the null hypothesis is: reject if P < 0.01 and fail to reject if P > 0.10. But what if the P-value falls between 0.01 and 0.10? This is a gray area where we need to consider the real-life implications of making a Type I error or Type II error.
If rejecting the null hypothesis corresponds to saying that there is evidence that the bolts on an airplane won't fail during flight, we would want to have very strong evidence before we reached that conclusion (especially if we're flying on that plane). So here we might want to reject the null hypothesis only if P < 0.01 (or perhaps even lower).
If, on the other hand, we're evaluating a claim about the popularity of an American Idol contestant among that show's fans, little actual harm would result in making a Type I error, so we might feel comfortable rejecting the null hypothesis when P < 0.05, or even when P < 0.10.
If we decide ahead of time what that threshold will be (0.01, 0.05, 0.10 are common threshold values), then we call that threshold the significance level for the hypothesis test and use the Greek letter `alpha` to represent its value. If the P-value is below the significance level, we say that the difference we have observed between our sample statistics and our null hypothesis is statistically significant.
Certain disciplines (especially in the social sciences) have adopted a significance level of `alpha = 0.05` as some sort of "magic number" and automatically reject any null hypothesis when P < 0.05. This is a relic from the days when graphing calculators and computers were not readily available and it was difficult to compute P-values, so people used printed tables to approximate P-values and simply determine whether the P-value is bigger or smaller than the significance level. But today we have easy access to technology, so it's best not to bother with significance levels and simply report the precise P-value along with our conclusion.
Practical significance
When we reject a null hypothesis, we're asserting that the results of our study or experiment are statistically significant. But statistical significance is not the same as practical significance in the real word. We might find that providing every statistics student at the college with a personal tutor available 24/7 results in a lower withdrawal rate for the statistics course. But if this program only prevents one or two students from dropping out each quarter at a cost of tens of thousands of dollars, we might decide not to implement the program as a practical matter, even though we know the program does work.
Conditions are important!
Before we conduct a hypothesis test, we need to check several conditions:
Two outcomes: If we don't have two outcomes (yes/no, success/failure) we can't perform a one-proportion hypothesis test; we need to use a more sophisticated mathematical technique.
Independent trials: Usually the trials are not truly independent, but if we have gathered the data via a simple random sample (SRS), then it's reasonably safe to assume the trials are independent. If the sample was not selected randomly. Without a random sample, we cannot proceed with the hypothesis test. (In some instances, it may not be clear whether or not the sample was random; in those situations we can clearly state that we're assuming that the sample is random and proceed, so that if we later learn the sample was not randomly selected, we can invalidate the results of our hypothesis test.)
Constant probability of success: Usually the probability of success is not constant, but as long as we're selecting a relatively small sample from a relatively large population, the probability of success will be nearly constant and we can proceed. (Usually our sample represents far less than 1% of the population, so this is not a problem; some books use 10% as a threshold, but 1% is safer.)
Normal approximation: We need to check that `np ge 10` and `nq ge 10`. If one or more of these inequalities does not hold, we can still conduct the hypothesis test, but we have to use binomial probability computations to find the P-value (and this only works when the alternative hypothesis is one-sided).
Exercises
1. If you reject a null hypothesis using a significance level of `alpha = 0.05`, would you also reject the null hypothesis when the significance level is:
a) `alpha = 0.10`?
b) `alpha = 0.01`?
2. If you reject a null hypothesis using a significance level of `alpha = 0.01`, would you also reject the null hypothesis when the significance level is:
a) `alpha = 0.05`?
b) `alpha = 0.10`? [Fixed earlier typo.]
3. If you fail to reject a null hypothesis using a significance level of `alpha = 0.05`, would you also fail to reject the null hypothesis when the significance level is:
a) `alpha = 0.10`?
b) `alpha = 0.01`?
In the remaining exercises, conduct a hypothesis test as indicated, or explain why you should not conduct the hypothesis test.
4. The Web site for MSNBC's The Ed Show asked visitors to respond to the question, "Was President Obama’s re-election a victory for the middle class?" Of 4,686, people who participated, 97.31% responded "yes," while the rest responded "no." Is it reasonable to claim that more than 9 out of 10 Americans think Obama’s re-election was a victory for the middle class?
5. The U.S. Census Bureau reports that 51.2% of the 2,262 residents of Garfield County, Washington, are female. Is it reasonable to claim that a majority of Garfield County residents are female?