Paired-data t-tests and t-intervals

For an article published in their December 15, 2005 issue ("Internet encyclopaedias go head to head" by Jim Giles), the editors of the science journal Nature chose 50 entries from the Web sites of Wikipedia and the Encyclopaedia Britannica on subjects that represented a broad range of scientific disciplines. All entries were chosen to be approximately the same length in both encyclopedias. In a small number of cases some material, such as reference lists, was removed to make the lengths of the entries more similar.

Each pair of entries was sent to an expert for peer review. The reviewers, who were not told which article was which, were asked to look for three types of inaccuracy: factual errors, critical omissions and misleading statements. A total of 42 usable reviews were returned. These were examined by Nature’s news reporters, who tallied the total inaccuracies for each entry. The data (taken from this page on the Nature Web site) is displayed in the following table:

entry Britannica Wikipedia difference
Acheulean industry 1 7 6
Agent Orange 2 2 0
Aldol reaction 4 3 -1
Archimedes’ principle 2 2 0
Australopithecus africanus 1 1 0
Bethe, Hans 1 2 1
Cambrian explosion 10 11 1
Cavity magnetron 2 2 0
Chandrasekhar, Subrahmanyan 4 0 -4
CJD 2 5 3
Cloud 3 5 2
Colloid 3 6 3
Dirac, Paul 10 9 -1
Dolly 1 4 3
Epitaxy 5 2 -3
Ethanol 3 5 2
Field effect transistor 3 3 0
Haber process 1 2 1
Kinetic isotope effect 1 2 1
Kin selection 3 3 0
Lipid 3 0 -3
Lomborg, Bjorn 1 1 0
Lymphocyte 1 2 1
Mayr, Ernst 0 3 3
Meliaceae 1 3 2
Mendeleev, Dmitry 8 19 11
Mutation 8 6 -2
Neural network 2 7 5
Nobel prize 4 5 1
Pheromone 3 2 -1
Prion 3 7 4
Punctuated equilibrium 1 0 -1
Pythagoras’ theorem 1 1 0
Quark 5 0 -5
Royal Greenwich Observatory 3 5 2
Royal Society 6 2 -4
Synchrotron 2 2 0
Thyroid 4 7 3
Vesalius, Andreas 2 4 2
West Nile Virus 1 5 4
Wolfram, Stephen 2 2 0
Woodward, Robert Burns 0 3 3

We might hypothesize that Wikipedia has more errors than Encyclopaedia Britannica, since the former contains contributions from a multitude of Internet contributors, while the latter features articles written by acknowledged experts overseen by professional editors. Because we have paired data here, what we're really interested in is the difference between the number of errors in the Wikipedia article on a particular subject and the number of errors in the Britannica article on the same subject.

For example, 1 inaccuracy in the Encyclopedia Britannica's article on Acheulean industry, compared with 7 inaccuracies in the Wikipedia entry on Acheulean industry, results in a difference of 6. The differences for all 42 subject appear in red in the table above.

To compute paired differences on the TI-84, clear out lists L1, L2, and L3, then enter the Encyclopaedia Britannica data values into L1 and the corresponding Wikipedia data values into L2. Now move the cursor over to L3 and then up so that the name of L3 is highlighted. Press ENTER. The blinking cursor should now be down at the bottom of the screen after L3=. Now type L2-L1, then press ENTER. The differences (shown above in red) should appear in L3.

We now proceed as we did in Chapter 23, using only the difference data. Our hypotheses are:

H0: µd = 0

HA: µd > 0

where the subscript "d" refers to "differences."

Our null hypothesis is that the mean of the differences in inaccuracies (for all scientific articles, not just the 42 in the study) is 0; in other words, that on average there is no difference between the number of inaccuracies in articles about scientific topics found on these two Web sites. (Sometimes Wikipedia might have more errors, and sometimes Britanica might have more errors, but on average the positive and negative differences cancel each other out.)

The alternative hypothesis is that the mean of all such differences is positive; since the differences were computed by subtracting the Encyclopaedia Brittanica values from the Wikipedia values, this means that on average Wikipedia has more inaccuracies than Encyclopaedia Brtiannica.

Before performing any computations, we check conditions:

Independent trials: We don't know exactly how the 42 articles were selected, but the Nature editors claim that these articles are representative of a broad range of scientific topics. It's reasonable to think that 42 articles is a small fraction of all scientific articles in these encyclopedias. It seems safe to proceed, although we should draw conclusions only about the articles on scientific topics at these two Web sites, and not extend our conclusions to other areas such as music, history and literature.

Normality: We don't need to graph the Encyclopaedia Britannica data or the Wikipedia data by itself since our test is about the differences; a histogram of the differences:

histogram of the inaccuracy differences

reveals a roughly unimodal and symmetric shape. The high value (the difference for the article about Dmitry Mendeleev) does not appear to be an extreme outlier (and the sample size is larger than 40) so it should be safe to proceed.

We now compute the summary statistics for the difference data (you can use 1-VarStats L3 on the TI-84): d ≈ 0.93 and sd ≈ 2.88 with df = 42−1 = 41. We can then compute:

`SE(bar d) = (s_d)/(sqrt(n)) approx (2.88)/(sqrt(42)) approx 0.44`

and then:

`t = (0.93 - 0)/(0.44) approx 2.114`

so P = tcdf(2.114,1E99,41) ≈ 0.02. We can also compute this P-value more directly by using T-Test on the TI-84, specifying L3 for the List.

This P-value is reasonably small, so we conclude that there is evidence (P = 0.02) that, on average, the articles on scientific topics on the Wikipedia Web site have more inaccuracies than the corresponding articles on the Encyclopaedia Britannica Web site. We may also wish to further investigate the Mendeleev outlier; if this was unusual for some reason and can be legitimately omitted, the P-value will be slightly larger and our conclusion may not be as clear-cut.

A paired-data t-interval
You might have noticed that at first this conclusion seems to be at odds with the subhead of the Nature article, which states, Wikipedia comes close to Britannica in terms of the accuracy of its science entries. Further down in the article, however, the author does note that Wikipedia on average has more inaccuracies, but that the difference is not that great. Just how big is the difference? Let's construct a confidence interval to find out.

Using a confidence level of 95%, we have: d ≈ 0.93 and sd ≈ 2.88 with df = 42−1 = 41 as before, and t*41 = invT(0.975,41) ≈ 2.02 (from the TI-84, if you have one). So:

ME = `t_41^(text(*)) times SE(bar d) approx 2.02 times 0.44 approx 0.89`

Adding this to and subtracting it from the observed sample mean, we have:

0.04 < µd < 1.82

We are 95% confident that (on average) articles about scientific subjects on the Wikipedia Web site have between 0.04 more and 1.82 more inaccuracies than the corresponding articles on the Encyclopaedia Britannica Web site. Note that the lower confidence interval limit is near 0. While the difference in the number of inaccuracies may be statistically significant, it may not be practically significant. (Then again, the upper limit of the confidence interval is near 2, so it may well be practically significant.)

As before you can find the confidence interval limits more directly on the TI-84 by using TInterval and specifying L3 for the data list.

Exercises

1. Mac vs. PC A statistics student wondered whether shoppers paid more for an Apple computer or a similarly equipped PC. He searched completed listings for used computers on eBay on December 9, 2009, then randomly selected 11 Apple computers with Intel-based processors. For each of these 11 computers, he examined the conguration (hard drive size, monitor size, CPU speed) and randomly selected a PC with a similar conguration. The resulting data set, which included the selling price of an Apple and a PC for 11 different configurations, appears in the following table:

config.  Apple   PC
A          630  310
B          620  330
C          909  325
D          860  500
E         1359  537
F          899 1099
G          599  475
H          899  394
J          915  389
K          545  422
L          620  585

a) Use a hypothesis test to evaluate the claim that, on average, people pay more for a used Apple than a similarly configured PC.

b) Construct a 95% confidence interval to estimate the average difference between the price people pay for an Apple and similarly configured PC.

2. Nerf guns For a class project, a statistics student tested his theory about the regulators found on Nerf guns: that they slow the muzzle velocity of the darts. He collected three Nerf guns that had 22 barrels among them, each barrel individually regulated. He fired a Nerf dart once using each barrel and measured how many inches it traveled, using the same dart on all tests. He then removed the regulators and fired one shot with each of the barrels again. The data he recorded appears below:

barrel  regulator  no regulator
     1        231           252
     2        208           245
     3        202           251
     4        212           265
     5        193           210
     6        201           234
     7        125           155
     8        168           141
     9         38            74
    10        154           231
    11        122           103
    12         77           123
    13        243           262
    14        215           252
    15        239           221
    16        234           268
    17        232           245
    18        237           252
    19        230           254
    20        245           259
    21        218           249
    22        246           262

a) Use a hypothesis test to investigate whether, on average, Nerf darts travel farther without a regulator.

b) Use a 95% confidence interval to estimate how much further, on average, Nerf darts travel farther without a regulator.