Ch. 25 Resources

Chapter 25: Paired Samples and Blocks

As in Chapter 24, we will be looking at situations with two samples in Chapter 25. The difference is that the samples will now be paired rather than independent.

As a quick example of the difference between these two situations, consider the following situation: One quarter I taught Statistics both online and in the traditional lecture format. For one midterm, I gave the same exam to both sections and compared the results to determine of there was a significant difference between the performance of the online students and the lecture students (there wasn't); for this situation I used a 2-sample t-test.

I then compared how the online students did on their first exam to how the same students did on the second exam (they did better on the first exam); here I couldn't use a 2-sample t-test, since the two groups were not independent (in fact, they consisted of the same 19 students (I omitted the students who had dropped the class prior to the second exam). Instead, I took the difference between the score on the first midterm and the score on the second midterm for each student (for example, one student scores 98 on the first exam and 84 on the second exam, so her difference was 14). Once I computed the difference for each student, I ended up with a single variable (the differences) so I was then able to use a 1-sample t-test to test the hypothesis that the mean difference was positive (in other words, that the students did better on the first test than the second test).

Wikipedia vs. Encyclopaedia Britannica

For an article published in their December 15, 2005 issue ("Internet encyclopaedias go head to head" by Jim Giles), the editors of the science journal Nature chose 50 entries from the Web sites of Wikipedia and the Encyclopaedia Britannica on subjects that represented a broad range of scientific disciplines. All entries were chosen to be approximately the same length in both encyclopedias. In a small number of cases some material, such as reference lists, was removed to make the lengths of the entries more similar.

Each pair of entries was sent to an expert for peer review. The reviewers, who were not told which article was which, were asked to look for three types of inaccuracy: factual errors, critical omissions and misleading statements. A total of 42 usable reviews were returned. These were examined by Nature’s news reporters, who tallied the total inaccuracies for each entry. The data (taken from this page on the Nature Web site) is displayed in the following table:

entry Britannica Wikipedia difference
Acheulean industry 1 7 6
Agent Orange 2 2 0
Aldol reaction 4 3 -1
Archimedes’ principle 2 2 0
Australopithecus africanus 1 1 0
Bethe, Hans 1 2 1
Cambrian explosion 10 11 1
Cavity magnetron 2 2 0
Chandrasekhar, Subrahmanyan 4 0 -4
CJD 2 5 3
Cloud 3 5 2
Colloid 3 6 3
Dirac, Paul 10 9 -1
Dolly 1 4 3
Epitaxy 5 2 -3
Ethanol 3 5 2
Field effect transistor 3 3 0
Haber process 1 2 1
Kinetic isotope effect 1 2 1
Kin selection 3 3 0
Lipid 3 0 -3
Lomborg, Bjorn 1 1 0
Lymphocyte 1 2 1
Mayr, Ernst 0 3 3
Meliaceae 1 3 2
Mendeleev, Dmitry 8 19 11
Mutation 8 6 -2
Neural network 2 7 5
Nobel prize 4 5 1
Pheromone 3 2 -1
Prion 3 7 4
Punctuated equilibrium 1 0 -1
Pythagoras’ theorem 1 1 0
Quark 5 0 -5
Royal Greenwich Observatory 3 5 2
Royal Society 6 2 -4
Synchrotron 2 2 0
Thyroid 4 7 3
Vesalius, Andreas 2 4 2
West Nile Virus 1 5 4
Wolfram, Stephen 2 2 0
Woodward, Robert Burns 0 3 3

We might hypothesize that Wikipedia has more errors than Encyclopaedia Britannica, since the former contains contributions from a multitude of Internet contributors, while the latter features articles written by acknowledged experts overseen by professional editors. We can't use a 2-sample t-test here, because the number of inaccuracies for each Web site is paired by topic (e.g. 1 inaccuracy in the Encyclopedia Britannica's article on Acheulean industry, compared with 7 inaccuracies in the Wikipedia entry on Acheulean industry, for a difference of 6). So, we first compute the paired differences.

To do this on the TI-84, clear out lists L1, L2, and L3, then enter the Encyclopaedia Britannica data values into L1 and the corresponding Wikipedia data values into L2. Now move the cursor over to L3 and then up so that the name of L3 is highlighted. Press ENTER. The blinking cursor should now be down at the bottom of the screen after L3=. Now type L2-L1, then press ENTER. The differences (shown in the above table in red) should appear in L3.

We now proceed as we did in Chapter 23, using only the difference data. Our hypotheses are:

H0: µd = 0

HA: µd > 0

where the subscript "d" refers to "differences." Our null hypothesis is that the mean of the differences in inaccuracies (for all scientific articles, not just the 42 in the study) is 0; in other words, that on average there is no difference between the number of inaccuracies in articles about scientific topics found on these two Web sites. The alternative hypothesis is that the mean of all such differences is positive; since the differences were computed by subtracting the Encyclopaedia Brittanica values from the Wikipedia values, this means that on average Wikipedia has more inaccuracies than Encyclopaedia Brtiannica.

Before performing any computations, we check conditions:

Paired data condition: The 42 articles from each Web site were paired topic.

Randomness condition: We don't know exactly how the 42 articles were selected, but the Nature editors claim that these articles are representative of a broad range of scientific topics. It seems safe to proceed, although we should draw conclusions only about the articles on scientific topics at these two Web sites, and not extend our conclusions to other areas such as music, history and literature.

10% condition: These 42 topics certainly represent less than 10% of all scientific articled included on these Web sites.

Nearly Normal condition: We don't need to graph the Encyclopaedia Britannica data or the Wikipedia data by itself since our test is about the differences; a histogram of the differences:

histogram of the inaccuracy differences

reveals a roughly unimodal and symmetric shape. The high value (the difference for the article about Dmitry Mendeleev) does not appear to be an extreme outlier (and the sample size is larger than 40) so it should be safe to proceed.

We now compute the summary statistics for the difference data (you can use 1-VarStats L3 on the TI-84): d ˜ 0.93 and sd ≈ 2.88 with df = 42−1 = 41. We can then compute:

`SE(bar d) = (s_d)/(sqrt(n)) approx (2.88)/(sqrt(42)) approx 0.44`

and then:

`t = (0.93 - 0)/(0.44) approx 2.114`

so P = tcdf(2.114,1E99,41) ≈ 0.02. This P-value is reasonably small, so we conclude that there is evidence (P = 0.02) that on average the articles on scientific topics on the Wikipedia Web site have more inaccuracies than the articles on the Encyclopaedia Britannica Web site. We may also wish to further investigate the Mendeleev outlier; if this was unusual for some reason and can be legitimately omitted, the P-value will be slightly larger and our conclusion may not be as clear-cut.

As we did in Chapter 23, you can check your answer for the P-value on the TI-84 by using T-Test and specifying L3 for the List.

A paired-data t-interval

You might have noticed that at first this conclusion seems to be at odds with the subhead of the Nature article, which states, Wikipedia comes close to Britannica in terms of the accuracy of its science entries. Further down in the article, however, the author does note that Wikipedia on average has more inaccuracies, but that the difference is not that great. Just how big is the difference? Let's construct a confidence interval to find out.

Using a confidence level of 95%, we have ): d ≈ 0.93 and sd ≈ 2.88 with df = 42−1 = 41 as before, and t*41 = invT(0.975,41) ≈ 2.02 (from the TI-84, or can approximate this by t*40 = 2.021 from Table T in the textbook). So:

ME = `t_41^(text(*)) times SE(bar d) approx 2.02 times 0.44 approx 0.89`

Adding this to and subtracting it from the observed sample mean, we have:

0.04 < µd < 1.82

We are 95% confident that (on average) articles about scientific subjects on the Wikipedia Web site have between 0.04 more and 1.82 more inaccuracies than the same articles on the Encyclopaedia Britannica Web site. Note that the lower confidence interval limit is near 0. While the difference in the number of inaccuracies may be statistically significant, it may not be practically significant. (Then again, the upper limit of the confidence interval is near 2, so it may well be practically significant.)

As before you can check your work on the TI-84 by using TInterval and specifying L3 for the data list.

Homework

Work the following exercises in Chapter 25: 7, 23, 27, 29, 31 and 35.

Errata

The "What" in the sidebar on page 650 should also include the lane designation and the heat number as variables, in addition to the time.

The definition at the top of page 664 should read "mean of pairwise differences between two groups" (not "independent" groups).

In Exercise 6, the questions are stated in the first paragraph and again as parts a and b.

ActivStats

Work through the lessons on page 25-1 in the ActivStats lesson book, as time permits.

Additional Resources

Two-sample t-test and paired-data t-test
Web-based computational tool from Graphpad Software
Paired (or Matched) Samples
A flash tutorial on using the TI-83's TTest feature to perform a hypothesis test about paired data.
Women in the Labor Force
Data set cited in Exercise 5.
"A Bayesian Analysis of a Multiplicative Treatment Effect in Weather Modification" (PDF)
Simpson J, Olsen A, and Eden J.
Technometrics. 17 (1975), 161-166
Article cited in Exercise 6.
"Is Friday the 13th bad for your health?"
Scanlon TJ, Luben RN, Scanlon FL, Singleton N.
BMJ. 1993 Dec 18-25; 307(6919):1584-6
Abstract of article cited in Exercises 7 and 8.
"Intermodal comparison of energy expenditure at exercise intensities corresponding to the perceptual preference range"
Moyna, N.M.; Robertson, R.J.; Meckes, C.L.; Peoples, J.A.; Millich, N.B.; Thompson, P.D.
Medicine & Science in Sports & Exercise. August 2001. 33(8):1404-1410
Abstarct of article cited in Exercise 22.
Student's t-test
Gosset's laevohyoscyamine hydrobromide data set, used in Exercise 25.
"The Probable Error of a Mean"
Student [W.S. Gosset]
Biometrika. 6 (1908), pp. 1–25
Gosset's original paper about the t-test, including the laevohyoscyamine hydrobromide data.
"Daily Weight Checks Could Combat Feared 'Freshman 15'"
Article about research of David Levistky at Cornell University, mentioned in Exercise 34.