Ch. 5 Resources

Chapter 5: Understanding and Comparing Distributions

I'll concentrate below on instructions for using the TI-84 and Data Desk to draw boxplots and other ways of creating graphical displays to compare two groups.

Boxplots with the TI-84

For our first example, let's work with the house data set previously encountered in the Chapter 2 and 4 Resources:

house size assess lot taxes stories
20911 1561 304 0.2 2604 1
20912 1038 297.6 0.2 280 1
20918 1224 289.5 0.17 2353 1
20921 1232 292.8 0.17 756 1
20924 1995 314.6 0.17 2620 2
20927 1714 322.7 0.18 2632 1
20930 1832 336.1 0.18 2779 2
21003 1095 279 0.18 2321 1
21006 2011 319.5 0.18 2663 2
21015 1366 289.3 0.18 2415 1
21018 1292 301.4 0.18 2477 1
21023 1458 314.3 0.18 1386 1
21028 2031 320.9 0.18 2676 2
21105 1366 304 0.18 2473 1

If you haven't already done so, enter the assessed value data into a list, say L1. To draw a boxplot of the assessed value data, follow the instructions in the Chapter 4 Resources for making a histogram, but choose the boxplot (or modified boxplot) icon instead of the histogram icon:

select the modified boxplot icon in the Stat Plots menu then use ZoomStat

Then use ZoomStat to get the boxplot:

boxplot of the assessed values

We can see a bit more clearly from the boxplot that the data is skewed positively (but notice that we can't tell if the data set is unimodal or bimodal from the boxplot, so we should look at both a histogram and a boxplot whenever possible). Note again that the axis isn't labeled and no scale is indicated, so this would not be a satisfactory graph on a HW solution, exam or project.

Boxplots with Data Desk

To use a computer to make a boxplot, use Data Desk. Import the houses.txt data file (from the preceding link or from the Data Sets folder in the online classroom) into Data Desk, as we did in the Chapter 4 Resources. Click on the assess variable so that the variable's icon has a Y over it:

click the assess variable to designate as Y

then click on Plot and select BoxPlot Side by Side:

click Plot and then Boxplot Side by Side

You should see something like this:

boxplot of assessed value data

You can adjust the plot options by clicking on the hyperview menu (the triangle in the upper-left corner of the boxplot window) and selecting BoxPlot Options:

click the hyperview menu and select BoxPlot Options

If you see some strange shading on your boxplot, I would recommend selecting Do not display 95% C.I.'s for comparing medians:

select options as shown and click OK

since you have no idea what this means yet; you can also select Set Defaults to make this the default display option.

As with the histogram in Chapter 4, you can make the boxplot window larger by clicking on the lower right corner of the window and dragging it across the screen. The variable name in our Data Desk boxplot is labeled and a scale is indicated on the axis, which is better than the TI-84, but the units are still missing. This would be better:

assessed value boxplot with improved labels

although I again had to use an image editor to add the label.

More boxplots

A boxplot of the 2006 property tax data for these homes reveals three outliers:

boxplot of the property tax variable

so we should report the median and IQR for the property tax variable, not the mean and standard deviation. If you do see a major outlier, you should investigate it: if it was the result of a data-entry error, you should correct it; if it was something that never should have been included in the data set in the first place (such as the age of the teacher in a data set consisting of the ages of students in a second-grade class), you can remove it; if it was reported in the wrong units (e.g. someone reporting their height in feet rather than inches) you can convert to the proper units. But you should never remove a data point just because it's an outlier.

You might, however, decide to report the summary statistics both with the outlier included and with it omitted. In the property tax data set, three of the homes are owned by senior citizens who participate in a program that freezes their property taxes (although they or their estate have to pay all of the deferred taxes when the home is sold). This explains the outliers, so we might choose to analyze the remaining 12 homes; if the remaining data is roughly unimodal and symmetric, then we could report the mean and standard deviation for the property taxes of a homeowner in this neighborhood not involved in the deferred-tax program.

Comparing groups

Use Data Desk to create a histogram of the size data from the houses.txt data set. You should get something like this:

histogram of the size variable

which appears bimodal. We certainly shouldn't report the mean and standard deviation for a variable like this. In fact, there may be two separate groups here.

With the histogram still open, double-click on the stories variable to open up the variable that lists the number of stories in each house.

stories variable open adjacent to size histogram

Now click on Modify and then Palettes to open up the Data Desk palettes (if some things disappear instead of appear, then click this again to make them reappear).

click Modify then Palettes

Click on the knife symbol to select it:

click on the knife symbol on the palette

Next hold down the SHIFT key and click on the rightmost bar of the histogram:

rightmost bar of histogram selected with knife tool

You should see that the all the houses in this upper group correspond to the 2-story houses on the data set. Perhaps it would be wise to investigate the 1-story and 2-story houses separately.

Click on the size variable to select it as Y, then hold down the SHIFT key and click on the stories variable to select it as X:

select size as Y and stories as X

Now click on Plot and Boxplot y by x:

click Plot then Boxplot y by x

You should see side-by-side boxplots, like this:

side-by-side boxplots of size variable for 1- and 2-story houses

Clearly the 2-story houses are bigger than the 1-story houses—which is not terribly surprising! You can make side-by-side boxplots on the TI-84 as well, but you'll need to manually enter the 1-story house sizes into one list and set up a boxplot of it (as described above) and then manually enter the 2-story house sizes into another list and set up another boxplot using Plot2 instead of Plot1; when you press GRAPH you should see both boxplots.

Homework

Work the following problems in Chapter 5: 11, 13, 15, 23, 27, 29, 33, 43 and 47. (As usual, you are encouraged to work additional problems.)

Errata

Although the text doesn't mention it, the wind speed data set introduced on page 88 is on the DVD.

Likewise, the roller coaster data set introduced on page 90 is also on the DVD...

...as is the coffee mug data set introduced on page 93...

...and the late arrival data on page 95...

...as well as the CEO data on page 101...

...and the cotinine data on page 103.

On page 90, the lower fence computation should read:

Lower fence = Q1 1.5IQR = 1.15 − 1.5×1.78 = -1.52 mph

On page 111, part b of Exercise 11 should have a question mark (not a period) at the end of the sentence.

The music library data set mentioned in Exercise 50 is on the DVD, even though the orange T symbol is missing.

ActivStats

Work the activities on pages 5-1 through 5-2 in the ActivStats lesson book, as time permits.

Additional Resources

Describing Distributions
Episode 3 from Against All Odds includes a discussion of boxplots.
Carnegie Mellon: Introduction to Statistics
This open source course has a lesson about boxplots.
Boxplot tool
A Java applet for creating boxplots.
TI-83/84 Troubleshooting
Guide to some common errors encountered when using the TI-84.