Ch. 7 Resources

Chapter 7: Scatterplots, Association, and Correlation

In these Resources I'll concentrate on using the TI-84 and Data Desk to create scatterplots and compute correlation.

Scatterplots on the TI-84

As an example, let's use the size and 2007 assessed value data from the property tax data set we previously investigated in the Resources for Chapters 2, 4 and 5:

house size assess lot taxes stories
20911 1561 304 0.2 2604 1
20912 1038 297.6 0.2 280 1
20918 1224 289.5 0.17 2353 1
20921 1232 292.8 0.17 756 1
20924 1995 314.6 0.17 2620 2
20927 1714 322.7 0.18 2632 1
20930 1832 336.1 0.18 2779 2
21003 1095 279 0.18 2321 1
21006 2011 319.5 0.18 2663 2
21015 1366 289.3 0.18 2415 1
21018 1292 301.4 0.18 2477 1
21023 1458 314.3 0.18 1386 1
21028 2031 320.9 0.18 2676 2
21105 1366 304 0.18 2473 1

First, enter the data into two lists; here I'll use L1 for size and L2 for assessed value:

enter size data in L1 and assess data in L2

Then go to the STAT PLOT menu, press ENTER to select Plot1, move the cursor to On and press ENTER, move the cursor to the scatterplot icon (the first of the six Type icons) and press ENTER:

turn on Plot1, move cursor to scatterplot icon and press ENTER

Now move to the Xlist line and type L1 and ENTER, then move to the Ylist line and type L2 and ENTER. You can choose whichever Mark you like most.

Next press the ZOOM key, move down to ZoomStat and press ENTER. You should see a scatterplot like this:

scatterplot of size vs. assess

We see a reasonably linear association between size and assessed value, with no noticeable outliers and only a moderate amount of scatter. Notice that there are no scales or labels on this scatterplot. If possible, we should always include these on any scatterplot we use for a HW assignment, project or exam.

You can now press the TRACE button and use the left and right arrow keys to move the flashing cursor from point to point on the scatterplot; as you do, you should see the coordinates of each point listed at the bottom of the screen:

press TRACE and use cursors to navigate from point to point

You might also want to use the WINDOW menu to adjust the scale on each axis.

Correlation on the TI-84

To find the correlation (r) using the TI-84, we must first draw a scatterplot, as demonstrated above, to check that the scatterplot is reasonably straight; if the data curves noticeably, the correlation will be meaningless (since the correlation measures the strength of a linear association). In fact, before computing the correlation we should always check the three correlation conditions:

Quantitative Variables Condition: Both the size and the assessed value of a property are quantitative variables (with units in square feet and thousands of dollars, respectively).

Straight Enough Condition: As we can see in the scatterplot created above, the relationship between these variables appears to be reasonably straight, with no obvious bends or curves.

Outlier Condition: While there is a moderate amount of scatter in the data set, there do not appear to be any significant outliers.

After graphing the scatterplot, QUIT the graphing mode (press 2ND and MODE) and then press STAT, move the cursor to CALC and move the cursor down to 8:LinReg(a+bx):

press STAT then move cursor to CALC and down to LinReg(a+bx)

press ENTER, then type L1 followed by , (a comma) and then L2:

press ENTER then type L1 a comma and L2, then press ENTER

Now press ENTER. You should get a screen that looks like this:

linear regression output for size vs. assess

What does all of this mean? We will learn about the linear equation displayed here (and its coefficients) in the next chapter. The number r is the correlation. Why do we want to know r2 too? We'll learn the answer to that in Chapter 8 as well.

If you don't see r and r2 on you calculator, do the following: press 2ND and then 0 to get to the CATALOG menu:

press 2ND and 0 to get to the catalog menu

Now scroll down (or press the x-1 key, which has a green D above it, to skip to the D entries, then scroll down) until you see DiagnosticOn:

scroll down to DiagnosticOn and press ENTER twice

and press ENTER twice. You should see this:

after pressing ENTER twice this should appear

Now run LinReg(a+bx) again. The calculator should always tell you the value of r and r2 from now on, at least until you take the batteries out of your calculator.

You may notice that there is a LinReg(ax+b) listed in the CALC menu along with LinReg(a+bx). Both of these will give us the same information, but let's stick with LinReg(a+bx) since it matches the order in which we will write the linear equation that we will learn about in Chapter 8.

Scatterplots and Correlation in Data Desk

You can also make scatterplots and compute the correlation with Data Desk. After importing the houses.txt data set, click on the assess variable so that it is designated as Y, then hold down the SHIFT key and click on the size variable so that it is designated as X:

click on assess to designate as Y, then shift-click on size to designate as X

Now click on Plot and Scatterplots:

click Plot then Scatterplots

You should get a scatterplot:

scatterplot of size vs. assess

While the scales are indicated and the variables are listed, there are still no units indicated. This scatterplot would be better:

better scatterplot, with units labeled

(although I had to modify the Data Desk output using an image editing program). To compute correlation, click on the hyperview menu of the scatterplot window and select Correlation of assess vs. size:

click the hyperview menu to access correlation options

As with the TI-84, we see that r = 0.820:

correlation of size vs. assessed value

 

Homework

Work the following exercises in Chapter 7: 1, 5, 11, 15, 23, 27, 31, 35 and 41. If you're pressed for time you can ignore the section called "Straightening Scatterplots" on page 179; this material will appear again in (the optional) Chapter 10.

Errata

The hurricane prediction data set introduced on page 166 is on the DVD, even though this isn't indicated in the text.

The W's in the margin on page 166 are missing one of the variables for the What (time is also a variable).

The wind speed data set introduced in the For Example on page 168 is on the DVD, even though this isn't indicated in the text.

The Ithaca students data set introduced on page 170 is on the DVD, even though this isn't indicated in the text.

The blood pressure data set introduced on page 175 is on the DVD (look for the file Ch07_SBS_Framingham.txt).

On page 179, the second line of text below the first scatterplot should read "straightens out the curve" (not "line").

Exercise 30 should read: "to see if the two three data sources..."

The lunchtime data for Exercise 33 is on the DVD; the orange T symbol is missing.

The drug abuse data for Exercise 36 is on the DVD; the orange T symbol is missing.

ActivStats

Work the activities on pages 7-1 through 7-4 in the ActivStats lesson book, as time permits. Pages 7-5 and 7-6, while a worthwhile discussion of how to compare two or more variables when one or more of them is a categorical variable or when you have three or more variables, does not relate directly to the information in Chapter 7 of our text, so you may wish to skip these for now and come back to them when you have time.

Additional Resources

Correlation
Episode 9 from Against All Odds features a discussion of correlation.
Carnegie Mellon: Introduction to Statistics
Carengie Mellon's open source statistics course includes a lesson called "Examining Relations" that includes a discussion of scatterplots and correlation.
Sofia: Elementary Statistics
Lesson 12.2 of the Sofia Open Content Initiative's Elementary Statistics course includes a discussion of scatterplots and Lesson 12.4 discusses correlation. (Some of the terminology may be unfamiliar here since this course covers scatterplots and correlation far later in the game than we do.)
Guessing correlations
A Java applet that offers practice guessing correlations, as in Exercises 11 and 12.
TI-83 Resource: Linear Regression
Instructions for using the TI-83 to compute correlation.
LinReg tutorial
A Flash tutorial on using the TI-83 to compute correlation, using data about the Seattle Mariners. (Ignore the discussion of "critical values in Table A-6.")
Scatterplot, correlation and regression on the TI-83/84
Instructions for graphing scatterplots and computing correlation on the TI-84.