Ch. 5 Resources

Chapter 5: Understanding and Comparing Distributions

I'll concentrate below on instructions for using the TI-84 and Data Desk to draw boxplots and other ways of creating graphical displays to compare two groups.

Boxplots with the TI-84

For our first example, let's work with the house data set previously encountered in the Chapter 2 and 4 Resources:

house	size	assess	lot	taxes	stories
20911	1561	304	0.2	2604	1
20912	1038	297.6	0.2	280	1
20918	1224	289.5	0.17	2353	1
20921	1232	292.8	0.17	756	1
20924	1995	314.6	0.17	2620	2
20927	1714	322.7	0.18	2632	1
20930	1832	336.1	0.18	2779	2
21003	1095	279	0.18	2321	1
21006	2011	319.5	0.18	2663	2
21015	1366	289.3	0.18	2415	1
21018	1292	301.4	0.18	2477	1
21023	1458	314.3	0.18	1386	1
21028	2031	320.9	0.18	2676	2
21105	1366	304	0.18	2473	1

If you haven't already done so, enter the assessed value data into a list, say L1. To draw a boxplot of the assessed value data, follow the instructions in the Chapter 4 Resources for making a histogram, but choose the boxplot (or modified boxplot) icon instead of the histogram icon:

$select the modified boxplot icon in the Stat Plots menu then use ZoomStat$

Then use ZoomStat to get the boxplot:

$boxplot of the assessed values$

We can see a bit more clearly from the boxplot that the data is skewed positively (but notice that we can't tell if the data set is unimodal or bimodal from the boxplot, so we should look at both a histogram and a boxplot whenever possible). Note again that the axis isn't labeled and no scale is indicated, so this would not be a satisfactory graph on a HW solution, exam or project.

Boxplots with Data Desk

To use a computer to make a boxplot, use Data Desk. Import the houses.txt data file (from the preceding link or from the Data Sets folder in the online classroom) into Data Desk, as we did in the Chapter 4 Resources. Click on the assess variable so that the variable's icon has a Y over it:

$click the assess variable to designate as Y$

then click on Plot and select BoxPlot Side by Side:

$click Plot and then Boxplot Side by Side$

You should see something like this:

$boxplot of assessed value data$

You can adjust the plot options by clicking on the hyperview menu (the triangle in the upper-left corner of the boxplot window) and selecting BoxPlot Options:

$click the hyperview menu and select BoxPlot Options$

If you see some strange shading on your boxplot, I would recommend selecting Do not display 95% C.I.'s for comparing medians:

$select options as shown and click OK$

since you have no idea what this means yet; you can also select Set Defaults to make this the default display option.

As with the histogram in Chapter 4, you can make the boxplot window larger by clicking on the lower right corner of the window and dragging it across the screen. The variable name in our Data Desk boxplot is labeled and a scale is indicated on the axis, which is better than the TI-84, but the units are still missing. This would be better:

$assessed value boxplot with improved labels$

although I again had to use an image editor to add the label.

More boxplots

A boxplot of the 2006 property tax data for these homes reveals three outliers:

$boxplot of the property tax variable$

so we should report the median and IQR for the property tax variable, not the mean and standard deviation. If you do see a major outlier, you should investigate it: if it was the result of a data-entry error, you should correct it; if it was something that never should have been included in the data set in the first place (such as the age of the teacher in a data set consisting of the ages of students in a second-grade class), you can remove it; if it was reported in the wrong units (e.g. someone reporting their height in feet rather than inches) you can convert to the proper units. But you should never remove a data point just because it's an outlier.

You might, however, decide to report the summary statistics both with the outlier included and with it omitted. In the property tax data set, three of the homes are owned by senior citizens who participate in a program that freezes their property taxes (although they or their estate have to pay all of the deferred taxes when the home is sold). This explains the outliers, so we might choose to analyze the remaining 12 homes; if the remaining data is roughly unimodal and symmetric, then we could report the mean and standard deviation for the property taxes of a homeowner in this neighborhood not involved in the deferred-tax program.

Comparing groups

Use Data Desk to create a histogram of the size data from the houses.txt data set. You should get something like this:

$histogram of the size variable$

which appears bimodal. We certainly shouldn't report the mean and standard deviation for a variable like this. In fact, there may be two separate groups here.

With the histogram still open, double-click on the stories variable to open up the variable that lists the number of stories in each house.

$stories variable open adjacent to size histogram$

Now click on Modify and then Palettes to open up the Data Desk palettes (if some things disappear instead of appear, then click this again to make them reappear).

$click Modify then Palettes$

Click on the knife symbol to select it:

$click on the knife symbol on the palette$

Next hold down the SHIFT key and click on the rightmost bar of the histogram:

$rightmost bar of histogram selected with knife tool$

You should see that the all the houses in this upper group correspond to the 2-story houses on the data set. Perhaps it would be wise to investigate the 1-story and 2-story houses separately.

Click on the size variable to select it as Y, then hold down the SHIFT key and click on the stories variable to select it as X:

$select size as Y and stories as X$

Now click on Plot and Boxplot y by x:

$click Plot then Boxplot y by x$

You should see side-by-side boxplots, like this:

$side-by-side boxplots of size variable for 1- and 2-story houses$

Clearly the 2-story houses are bigger than the 1-story houses—which is not terribly surprising! You can make side-by-side boxplots on the TI-84 as well, but you'll need to manually enter the 1-story house sizes into one list and set up a boxplot of it (as described above) and then manually enter the 2-story house sizes into another list and set up another boxplot using Plot2 instead of Plot1; when you press GRAPH you should see both boxplots.

Homework

Work the following problems in Chapter 5: 11, 13, 15, 23, 27, 29, 33, 43 and 47. (As usual, you are encouraged to work additional problems.)

Errata

Although the text doesn't mention it, the wind speed data set introduced on page 88 is on the DVD.

Likewise, the roller coaster data set introduced on page 90 is also on the DVD...

...as is the coffee mug data set introduced on page 93...

...and the late arrival data on page 95...

...as well as the CEO data on page 101...

...and the cotinine data on page 103.

On page 90, the lower fence computation should read:

Lower fence = Q1 − 1.5IQR = 1.15 − 1.5×1.78 = -1.52 mph

On page 111, part b of Exercise 11 should have a question mark (not a period) at the end of the sentence.

The music library data set mentioned in Exercise 50 is on the DVD, even though the orange T symbol is missing.

ActivStats

Work the activities on pages 5-1 through 5-2 in the ActivStats lesson book, as time permits.

Additional Resources

Describing Distributions: Episode 3 from Against All Odds includes a discussion of boxplots.
Carnegie Mellon: Introduction to Statistics: This open source course has a lesson about boxplots.
Boxplot tool: A Java applet for creating boxplots.
TI-83/84 Troubleshooting: Guide to some common errors encountered when using the TI-84.

Return to the Public Course Page