Ch. 4 Resources

Chapter 4: Displaying and Summarizing Quantitative Data

This chapter revolves around making three types of plots (histograms, stem-and-leaf displays and dotplots) and computing numerical summaries of quantitative data, such as means and medians. In the notes that follow, I will concentrate on instructions for entering data into the TI-83 or TI-84 and then generating a histogram from that data and computing summary statistics (as well as doing the same using Data Desk). Stem-and-leaf displays are most easily drawn by hand; see the links below for more information on constructing these. ActivStats provides a lesson on dotplots, but as the textbook notes, there's really not much reason to use them instead of a stem-and-leaf display.

Property taxes

For our first example with a quantitative variable, let's revisit the data about the single-family residences on the street where I live in Edmonds, first presented in the Chapter 2 Resources. For reference, here is the data one more time:

house	size	assess	lot	taxes	stories
20911	1561	304	0.2	2604	1
20912	1038	297.6	0.2	280	1
20918	1224	289.5	0.17	2353	1
20921	1232	292.8	0.17	756	1
20924	1995	314.6	0.17	2620	2
20927	1714	322.7	0.18	2632	1
20930	1832	336.1	0.18	2779	2
21003	1095	279	0.18	2321	1
21006	2011	319.5	0.18	2663	2
21015	1366	289.3	0.18	2415	1
21018	1292	301.4	0.18	2477	1
21023	1458	314.3	0.18	1386	1
21028	2031	320.9	0.18	2676	2
21105	1366	304	0.18	2473	1

Let's graph one of the quantitative variables. We'll start with the 2007 assessed value.

Histograms with the TI-83 and TI-84

Before we can graph data using the calculator, we need to enter the data into a list. The TI-83 and TI-84 have six built-in lists, called L1, L2, L3, L4, L5 and L6. To access the lists, press STAT:

$TI-83 after pressing STAT$

(To complicate matters there is also a LIST menu above the STAT key, but we don't want to use this right now, so ignore it for the time being.)

Next press ENTER and you should be in the list editor:

$TI-83 after pressing STAT and ENTER$

Use the arrow keys to move up and down in a list, or the left and right arrows to move from one list to another.

If a list you want to use already contains data and you want to clear the list, move the cursor up so that the list name (e.g. L1) is highlighted:

$use arrow keys to highight list name, then press CLEAR and ENTER$

then press CLEAR and ENTER; all of the data in the list should disappear. DO NOT press the DEL key when the list name is highlighted: this will delete the entire list, rather than its contents. To restore a deleted list move the cursor to the name of another list, press 2ND and then INS (for insert, above the DEL key), then press 2ND and then L1 (above the 1 key, or another key for the name of the list you want to insert). You can, however, use the DEL key to delete a single entry in a list.

To enter data once the entries in the list have been cleared, move the cursor to the first position in the list, type a number, then press ENTER. Repeat this until you have entered all of the data. If you enter the 2007 assessed value data into list L1 on your TI-84 (from here on out I'll just write "TI-84" so that I don't have to type "TI-83 or TI-84" repeatedly) your screen should look something like this:

$assessed value data in L1$

Notice that when you move the cursor over the last entry in the list it says L1(14) = 304. This means that the 14th entry in list L1 is 304. Since there are 14 houses in the data set, this is what we want to see.

To exit the list editor, press 2ND then QUIT (above the MODE key).

We can now draw a histogram with the TI-84 using the assessed value data into list L1. Press 2ND then STAT PLOT (above the Y= key) to get to the STAT PLOTS menu:

$press 2ND and Y= to access STAT PLOTS$

then press ENTER to select Plot1:

$menu screen for Plot1$

Move the cursor over On and press ENTER to turn on Plot1:

$move the cursor to On and press ENTER$

Now move the cursor down and then left two spaces so that it highlights the histogram icon, then press ENTER:

$move the cursor to the histogram icon and press ENTER$

If Xlist is not already set to L1 (or the name of the list with your data), move the cursor down, then type L1 (2ND and then the 1 key) and press ENTER to designate L1 as the Xlist.

If Freq is not automatically set to 1 (it usually is) then you'll need to reset it to 1. For some reason the TI-84 defaults to the alpha-lock when this entry is highlighted; if you need to change the frequency to 1 from something else, you'll need to press ALPHA to turn off the alpha-lock, then type the number 1. Your screen should then look something like what you see above.

Now press ZOOM, then move the cursor down to ZoomStat:

$press ZOOM, navigate down to ZoomStat and press ENTER$

and press ENTER. This is usually the most expedient way to choose an appropriate window, but you may still need to adjust the window slightly. Here is the histogram you should get using ZoomStat:

$histogram of assessed value data using ZoomStat$

To manually adjust the window settings, press the WINDOW button, then enter suitable values and press GRAPH. You can experiment with different WINDOW values, which may give you different histograms. If you change the WINDOW values as follows:

$press WINDOW and adjust values as shown$

and press GRAPH you should get a histogram that looks like this:

$histogram of assessed value data using adjusted Window values$

Note that we sometimes get vastly different histograms just by adjusting the starting point and the bar width. We might say that the above histogram appears bimodal, while the original histogram appeared to have an outlier that is not visible in our second histogram.

Note that there is no scale indicated in this histogram, nor is the horizontal axis labeled with a variable name or units. Whenever we draw a histogram we should include all of these labels if technology permits.

Histograms from frequency tables

On occasion, quantitative data is displayed in a frequency tables, much like those we created for categorical variables in Chapter 3. This ussually occurs when the quantitative variable only takes on discrete values, or is conveniently rounded to an integer.

For example, in Fall 2006 students in my online class were given an unlimited number of attempts to take a 5-point quiz. We could display the number of attempts like this:

attempts	count
0	3
1	8
2	8
3	4
4	2
5	2
6	1

If we wanted to enter this data into the TI-84 to create a histogram, we could enter 0 three times, then 1 eight times, then 2 eight times, and so on into a list until the list had 28 entries, one for each student. But there's another way. We can enter the number of attempts (the left column) into one list (say L1) and the counts into the next list (say L2):

$enter attempts in L1 and counts in L2$

We then follow the same steps as before, except we type L2 for Freq in the STAT PLOT menu:

$use L2 for Freq$

To get a histogram we can use ZoomStat:

$histogram of attemtps on TI-83$

Of course, it would have been much simpler in this case to create a stem-and-leaf display by hand:

0	000
0	11111111
0	22222222
0	3333
0	44
0	55
0	6	Key: 0\|6 = 6 attempts

Notice that if you rotate your head to the right this stem-and-leaf display appears to have the same shape as the histogram. The advantage of the stem-and-leaf display is that all of the original data values are still apparent. Generally, when constructing stem-and-leaf displays we put the larger values at the top and the smaller vaules at the bottom

0	6	Key: 0\|6 = 6 attempts
0	55
0	44
0	3333
0	22222222
0	11111111
0	000

Histograms with Data Desk

If you want to use a computer to make a histogram, use Data Desk. (Excel has something that it calls a histogram, but it's not a histogram; while it is possible to make histograms with Excel, it's complicated and takes about 100 times longer than it does with Data Desk.)

First, save the houses.txt data file to your desktop or USB drive, either by right-clicking the preceding link in this or grabbing it from the Data Sets folder on the main page of the online classroom.

Now start Data Desk, click File and Import..., navigate to houses.txt and click once on the file name, then click Open. Click Use these variable names; see the "Introduction to Data Desk" resource for importing data into Data Desk. At this point you should see all of the variables from the houses.txt data set displayed like this:

$houses.txt variables displayed in Data Desk$

Now click on the assess variable so that the variable's icon is designated as Y:

$click on assess to designate as Y$

then click on Plot and select Histograms:

$click on Plot and Histograms$

The following histogram should appear:

$histogram of 2007 assessed value data$

You can make the window larger by clicking on the lower right corner of the window and dragging it across the screen:

$click on the icon in the lower right corner of the histogram and drag to resize$

You can adjust the plot options by clicking on the hyperview menu (the little triangle icon in the upper-left corner of the histogram window) and selecting Plot Scale:

$click on the hyperview menu and select Plot Scale$

If we use 270 for Align bars at and 12 for Bar width:

$change Align bar at: to 270 and Bar width: to 12$

we get this histogram:

$revised histogram of 2007 assessed value data$

which should remind you of one we created with the TI-84. You can also hold down the CTRL key and click and drag the icon in the lower-right corner of the histogram window to adjust the bar widths.

Note that Data Desk includes a scale on each axis and names the variable, which is at least better than the TI-84. The units and a label for the vertical axis are still missing. Something like this:

$better histogram (with units and vertical axis labeled)$

would be better, although I had to use an image editor to include the new information along the axes.

As before, to copy and paste a Data Desk histogram to another application, make sure the histogram window is selected, then click Edit and Copy Window. Then go to another application (like Microsoft Word) and paste.

Now try using the TI-84 and Data Desk to create displays of the other quantitative variables in the house data set.

Positively Skewed vs. Negatively Skewed

Once we have made a histogram (or stemplot or dotplot), we want to be able to describe the shape, center and spread of the graph. In Chapter 4 we concentrate on the shape (uniform, unimodal, bimodal; symmetric, skewed to the right, skewed to the left); we'll discuss more precise ways of measuring the center and spread in the next chapter.

Sometimes we use the term "positively skewed" in place of "skewed to the right", and "negatively skewed" in place of "skewed to the left"; the terms "positive" and "negative" are especially appropriate when a histogram or stem-and-leaf display is constructed with the quantitative variable along the vertical axis rather than the horizontal axis, as is the case with this stem-and-leaf display of the number of attempts made on a quiz by my students during Fall 2006:

0	6
0	55
0	44
0	3333
0	22222222
0	11111111
0	000	Key: 0\|6 means 6 attempts

We would call this data set unimodal and positively skewed, since the longer tail is skewed in the positive direction. It would not be wrong to say that the data is skewed to the right, but since there is no right or left here (just up and down), the phrase "positively skewed" is more precise.

More Property Taxes

For our next example, let's compute the summary statistics for the 2007 assessed value variable in the houses.txt data set.

Enter the assessed value data into a list, say L1, then press 2ND and QUIT (above the MODE key), as you did to create a histogram. Now press STAT, move the cursor to the right to highlight CALC and notice that 1-Var Stats is already highlighted:

$with data in L1, press STAT, move cursor to CALC and press ENTER$

Press ENTER and type L1 (2ND and then the 1 key). Your screen should look like this:

$press L1 and then ENTER$

Now press ENTER. The calculator will display many different values:

$ouput of 1VarStats for assessed value data$

These values are:

`bar x` (the mean)
`sum x` (the sum of the values in L1; you can ignore this for now)
` sum x^2` (the sum of the squares of the values in L1; you can ignore this as well)
Sx (the standard deviation of the values in L1; we simply call it s)
`sigma x` (the population standard deviation; we'll talk about this in Chapter 6, but we will NEVER use the TI-84 to calculate this, so always ignore this part of the 1-Var Stats output)
n (the number of data values in L1)

So far we can see that the mean of the 2007 assessed property values for homes in my neighborhood is $306,121, with a standard deviation of $15,898. You could compute these statistics "by hand" but it would take a ridiculously long time: ALWAYS use the calculator or computer to compute summary statistics, especially the standard deviation.

But there's more! Use the down cursor to scroll down the screen as far as you can. You should see:

$5-number summary from 1VarStats output$

We can now read off:

minX (the minimum data value in L1)
Q1 (the first quartile of the values in L1)
med (the median of the values in L1)
Q3 (the third quartile of the values in L1)
maxX (the maximum of the values in L1)

We call these five quantities the 5-number summary for the data set. The median 2007 assessed value of a home in this neighborhood is $304,000. The IQR is given by IQR = Q3−Q1 = 319.5−292.8 = $26,700. Note that the TI-84 doesn't report the IQR directly, but it's a simple subtraction problem once we know Q1 and Q3.

Summary statistics from frequency tables

Recall the example above with data about the number of attempts students in my Fall 2006 online class made on a 5-point quiz. We displayed the number of attempts like this:

attempts	count
0	3
1	8
2	8
3	4
4	2
5	2
6	1

As before, we can enter the number of attempts (the left column) into one list (L1) and the counts into the next list (L2). Now type 1-Var Stats L1 as above, but then type , (a comma, above the 7 key) and then L2:

$type 1-VarStats L1,L2 then press ENTER$

Now press ENTER to get the summary statistics for the quiz attempts by the 28 students:

$summary statistics for the quiz attempt data$

Summary Statistics with Data Desk

To compute summary statistics of the 2007 assessed value variable, select the assess variable as Y (as before) and click Calc, then Summaries and then Reports:

$click Calc then Summaries then Reports$

You should see output like this:

$summary statistics for the 2007 assessed values$

If you don't see all of the statistics that you want, click the hyperview menu and choose Select Summary Statistics.

$in the hypeview menu, click Select Summary Statistics$

Select or deselect the appropriate checkboxes and click OK:

$select desired summary statistics and click OK$

As we saw from the calculator, the mean assessed value is $306,121 with a standard deviation of $15,898.

Median and IQR vs. Mean and Standard Deviation

Keep in mind that you should never simply compute the summary statistics and report them: you should also draw a picture, such as a histogram or stem-and-leaf display. (This is fairly easy to do if you already have the data in the calculator or computer, and it's a good idea to draw the picture before you compute the summary statistics since a picture is often the easiest way to see that you have made a data entry error.)

If the data is roughly unimodal and symmetric, then the mean and standard deviation are usually the most appropriate measures of center and spread, respectively, for the data set; if, on the other hand, the data is strongly skewed or has one or more major outliers, you should report the median and IQR.

Homework

Work the following problems in Chapter 4: 5–13 odd, 21, 25, 27, 33, 37, 41, 49 and 53. (As usual, you are encouraged to work additional problems.)

Errata

On page 49, the Who in the margin should read "1240 earthquakes" (not 2410), as should the first line of the first complete paragraph on that page.

Although it is not noted in the text, the earthquake data set used to create the histogram on page 49 is included on the DVD.

Somewhat less importantly, the first line in the margin on page 50 should read "stem-and- leaf display" and "stemplot" (rather than "Stem-and-Leaf display" and "Stemplot").

Although it is not noted in the text, the credit card data set on page 55 is included on the DVD (look for Ch04_credit_card_expense.txt).

On page 57, in newer printings of the book, the last line of the "Finding Median by Hand" box should read "(13.9+14.1)/2 = 14.0" (not "(13.9+14.1)>2 = 14.0").

On page 58, the penultimate line should read "1.0 Richter scale units" (not "0.9").

On page 59, the text below the histogram should read "IQR =1.0" (not "IRQ = 1.0").

On page 60, the line below the 5-number summary should read "It's a good idea" (not "It's good idea").

Although it is not noted in the text, the flight cancellation data set on page 60 is included on the DVD (look for Ch04_Cancelled_flights.txt).

The flight cancellation data is for the years 1995 through 2005 (not 2003); this error occurs in three places.

On page 67, ignore the discussion of boxplots in the Think and Show of the Step-by-Step Example (boxplots are not introduced until Chapter 5).

On page 68, ignore the discussion of boxplots in the Tell (boxplots are not introduced until Chapter 5).

On page 72, the second line of the definition of "Distribution" should read "equal-width" (not "equal- width").

On page 77, the TI-83/84 Plus and TI-89 instructions are messed up; just see the instructions in these Resources instead.

On page 81, the data set for Exercise 32 is on the DVD, even though the orange T symbol is missing.

On page 82, "NASA" should not be italicized in Exercise 41.

On page 83, Exercise 49 should read "ZIP codes" (not "Zip codes"); this applies elsewhere in the text as well.

ActivStats

Work the activities on pages 4-1 through 4-4 in the ActivStats lesson book, as time permits. These contain further information about constructing and using stem-and-leaf displays, as well as dotplots and histograms.

Additional Resources

Picturing Distributions: Episode 2 from Against All Odds features a discussion of histograms and stemplots.
Describing Distributions: Episode 3 from Against All Odds includes a discussion of means, medians and quartiles.
Carnegie Mellon: Introduction to Statistics: This open source course has a module on stemplots.
Sofia: Elementary Statistics: Lesson 2.1 of the Sofia Open Content Initiative's Elementary Statistics course includes a discussion of stemplots and histograms (ignore the discussion of boxplots until you reach Chapter 5); lessons 2.3 and 2.4 include a discussion of summary statistics.
Histogram tool: A Java applet for creating histograms. View histograms for built-in or user-specified data, and experiment with how the size of the class intervals influences the appearance of a histogram.
Histogram applet: A Java applet that allows you to play with histogram bin widths.
Stem-and-leaf intro: An introduction to stem-and-leaf displays by David M. Lane.
Stem-and-leaf tool: A Java applet for creating stem-and-leaf displays.
Stemplot tutorial: A detailed tutorial on constructing stem-and-leaf displays, from Statistics Canada.
TI-83 Resource: 1-VarStats: Instructions on creating a histogram with the TI-83; check out the link about entering data into lists if you having difficulty with that part of the process.
Histograms on the TI-83/84: Instructions for creating a histogram on the TI-83.
Summary Statisitcs on the TI-83/84: Instructions for computing summary statistics on the TI-83.
TI-83/84 Troubleshooting: Guide to some common errors encountered when using the TI-84.
The Median vs. the Mean in the Age of Average: NPR story about appropriate use of mean and median in reporting about an "average" person.

Return to the Public Course Page