
Review Problems: Week 3
1. Real Estate Craigslist is a Web site that allows users to post online classified advertisements at no charge (with the exception of job postings in certain metropolitan areas). The table below shows the street address, size (in square feet), asking price (in dollars) and number of bedrooms (abbreviated BR) for nine houses located in the city of Lynnwood listed for sale on Craigslist on October 9, 2011.
address size price BR
18712 57th Ave W 1805 349950 3
19014 24th Ave W 2404 329900 5
3203A 204th St SW 1912 250000 4
17112 6th Ave W 3200 509797 4
19631 9th Pl W 2369 339950 4
17402 62nd Ave W 1200 185000 3
21011 54th Ave W 1660 136950 3
14018 20th Pl W 2244 254950 4
14517 40 Ave W 2450 335000 3
a) If appropriate, compute the regression equation for a linear model to predict the price of a house based on its size. (Explain why such a model would, or would not, be appropriate.)
b) If appropriate, predict the price of a 1560 sq. ft. house in the city of Lynnwood.
c) If appropriate, predict the price of a 560 sq. ft. house in the city of Lynnwood.
d) What are the units for the slope of the regression line?
e) Interpret the value of the slope in terms of Lynnwood houses.
f) What are the units for the intercept of the regression line?
g) If appropriate, interpret the meaning of the intercept.
h) Compute the correlation.
i) Compute r2 and explain what this number represents.
j) Compute the residual for the house at 17402 62nd Ave W.
k) Examine a residuals plot and describe any pattern you see. What does the pattern (or lack thereof) it tell you?
2. Toyota Prius The Prius is a popular gas-electric hybrid car manufactured by Toyota. The table below shows the VIN (vehicle identification number), color, age (in years), mileage (in miles) and asking price (in dollars) for 13 used Toyota Prius automobiles advertised for sale on the Web site of The Seattle Times on January 23, 2011.
VIN age color mileage price
JTDKN3DU9A0056349 1 black 9277 28995
JTDKN3DU8A0165157 1 black 4180 28995
JTDKN3DU1A0057303 1 blue 32105 25995
JTDKN3DU9A0147198 1 gray 8129 24995
JTDKB20U197821193 2 pewter 28434 23995
JTDKB20U683348798 3 green 40762 22995
JTDKB20U187716331 3 green 24531 22995
JTDKB20U387727363 3 blue 16262 22995
JTDKN3DU8A0059050 1 silver 32830 21995
JTDKB20U383347267 3 gray 32604 21995
JTDKB20U583417996 3 gray 43827 18995
JTDKB20U697840628 2 white 24632 20995
JTDKB20U297880205 2 white 33651 18680
a) If appropriate, compute the regression equation for a linear model to predict the price of a Prius based on its mileage. (Explain why such a model would, or would not, be appropriate.)
b) If appropriate, predict the price of a Prius with 120,000 miles.
c) If appropriate, predict the price of a Prius with 12,000 miles.
d) What are the units for the slope of the regression line?
e) Interpret the value of the slope in terms of Prius prices.
f) What are the units for the intercept of the regression line?
g) If appropriate, interpret the meaning of the intercept.
h) Compute the correlation.
i) Compute r2 and explain what this number represents.
j) Compute the residual for the Prius with VIN JTDKB20U683348798.
k) Examine a residuals plot and describe any pattern you see. What does the pattern (or lack thereof) it tell you?
l) In addition to the 13 automobiles in the data set provided above, The Seattle Times also listed two significantly older Toyota Prius automobiles:
VIN age color mileage price
JT2BK12U630070267 8 green 83996 10995
JT2BK18U720060613 9 blue 110919 8995
If we included these two cars in the data set, would it be appropriate to construct a linear model to predict price based on mileage?
3. More Priuses On October 17, 2007, the classified ads on the Web site of The Seattle Times listed the following 14 used Toyota Prius automobiles for sale; the data set below shows the year, color, mileage (in miles) and asking price (in U.S dollars) for each car.
year color mileage price
2006 green 17043 25995
2007 gray 12628 24980
2005 maroon 24039 24885
2005 silver 48226 23995
2006 black 10522 22995
2004 silver 66345 21995
2007 white 5611 21995
2005 gold 24479 21595
2004 white 14618 20995
2005 silver 53699 20980
2001 unknown 171700 8300
2004 silver 47649 17995
2003 white 39600 17500
2005 black 103126 16995
a) If appropriate, compute the regression equation for a linear model to predict the price of a Prius based on its mileage. (Explain why such a model would, or would not, be appropriate.)
b) If appropriate, predict the price of a Prius with 120,000 miles.
c) If appropriate, predict the price of a Prius with 310,000 miles.
d) What are the units for the slope of the regression line?
e) Interpret the value of the slope in terms of Prius prices.
f) What are the units for the intercept of the regression line?
g) If appropriate, interpret the meaning of the intercept.
h) Compute the correlation.
i) Compute r2 and explain what this number represents.
j) Compute the residual for the maroon Prius.
k) Examine a residuals plot and describe any pattern you see. What does the pattern (or lack thereof) it tell you?
l) If we converted the mileage numbers from miles to kilometer and the prices from dollars to Euros, how would this affect the correlation?
4. Inkjet printers For their May 2005 issue, the editors of Consumer Reports compared the cost and effectiveness of a variety of inkjet printers. The following table lists the model, retail price (in dollars) and the text speed (in pages per minute, or ppm) for the 13 top-ranked models.
model price speed
HP Deskjet 6540 130 11.0
Canon Pixma iP4000 140 10.0
HP PhotoSmart 7760 150 6.0
HP Deskjet 5850 235 6.0
HP PhotoSmart 7960 230 6.0
HP PhotoSmart 8450 245 7.0
Canon Pixma iP5000 190 9.0
Canon Pixma iP2000 80 10.0
Canon Pixma iP8500 345 4.5
HP Deskjet 6127 250 7.0
Lexmark P915 Photo 135 9.0
Epson Stylus Photo R800 375 2.5
Lexmark Color Jetprinter Z816 90 9.5
a) Before computing the correlation between price and speed, we should check three things. List these conditions and in each case indicate whether or not the condition has been satisfied for this data.
b) Regardless of your answer to part a, compute the correlation between price and speed.
c) Compute r2 and explain the meaning of this numberin the context of this problem using a complete sentence or two.
For the remainder of this problem, assume that a linear model is appropriate.
d) Find the equation of the regression line for this data. Use appropriate notation.
e) Explain the meaning of the slope of the line, or state why it is meaningless in this context.
f) Explain the meaning of the intercept of the line, or state why it is meaningless in this context.
g) Compute the residual for the printer that costs $135.
h) If appropriate, predict the text speed for an inkjet printer that costs $800; if it is not appropriate, use a complete sentence to explain your answer.
i) If appropriate, predict the text speed for an inkjet printer that costs $200; if it is not appropriate, use a complete sentence to explain your answer.
j) Examine a residuals plot and describe any pattern you see. What does the pattern (or lack thereof) it tell you?
5. [OIS 7.6] Husbands and wives The Great Britain Office of Population Census and Surveys once collected data on a random sample of 170 married couples in Britain, recording the age (in years) and heights (converted here to inches) of the husbands and wives. The scatterplot on the left shows the wife's age plotted against her husband's age, and the plot on the right shows wife's height plotted against husband's height.
a) Describe the relationship between husbands' and wives' ages.
b) Describe the relationship between husbands' and wives' heights.
c) Which plot shows a stronger correlation? Explain your reasoning.
d) Data on heights were originally collected in centimeters, and then converted to inches. Does this conversion affect the correlation between husbands' and wives' heights?
e) What would be the correlation between the ages of husbands and wives if men always married woman who were
i) 3 years younger than themselves?
ii) 2 years older than themselves?
iii) half as old as themselves?
f) Would it be appropriate to construct a linear model for the association between husbands' ages and wives' ages?
g) Assuming that such a model would be appropriate, computer output from a regression analysis reports that the intercept of the regression line as 1.5740 and the slope as 0.9112. Write down an equation for the regression line.
h) If appropriate, predict the age of the wife of a 55-year old man in Great Britain. (If this is not appropriate, explain.)
i) If appropriate, predict the age of the wife of an 85-year old man in Great Britain. (If this is not appropriate, explain.)
j) If appropriate, predict the age of the wife of a 45-year old man in Canada. (If this is not appropriate, explain.)
k) The regression analysis also reports that r2 = 0.88 for this data set. Explain the meaning of this number.
l) What is the correlation of ages in this data set?
6. Brendan Ryan The following graphical display, which shows the April 2011 batting average vs. the May 2011 batting average for American League hitters with more than 50 at-bats during each month, comes from The Signal and the Noise: Why So Many Predictions Fail—But Some Don't by Nate Silver (Penguin, 2012)
a) What type of graphical display is this?
b) Describe any association you see in this display.
c) Would it be appropriate to compute the correlation between these two variables? Explain.
d) If you did compute the correlation, would it be positive or negative?
e) If you did compute the correlation, would it be close to 0 or close to 1?
f) Would it be appropriate to construct a linear model for the relationship between these two variables? Explain.
g) Would a linear model for the relationship between these two variables accurately predict a player's batting average in May based on their batting average in April? Explain.
h) Brendan Ryan (a shortstop for the Seattle Mariners), batted .184 in April and .384 in May. What term best applies to Brendan Ryan in this graphical display?
i) If we were to construct a linear model for this relationship, would Brendan Ryan have:
i) high leverage?
ii) a positive residual or a negative residual?
iii) a large residual or a small residual?
j) Are there any other players as unusual as Brendan Ryan?
7. James Bond The following information about the 23 James Bond films produced by Eon Productions over the past 50 years appears in the table below: production order; year of release; total box office gross (in millions of US dollars); budget (in millions of U.S. dollars); adjusted box office gross (in millions of US dollars, adjusted to the 2008 Consumer Price Index); and the duration (in seconds) of the title song for each film. The information comes from Wikipedia and Amazon.com (for the song duration).
no title year gross budget adj BO song
1 Dr. No 1962 59.6 1.2 425.5 107
2 From Russia With Love 1963 78.9 2.5 555.9 153
3 Goldfinger 1964 124.9 3.5 868.7 168
4 Thunderball 1965 141.2 11.0 966.4 182
5 You Only Live Twice 1967 111.6 9.5 720.4 165
6 On Her Majesty's Secret Service 1969 87.4 7.0 513.4 193
7 Diamonds Are Forever 1971 116.0 7.2 617.5 161
8 Live and Let Die 1973 161.8 12.0 785.7 193
9 The Man With the Golden Gun 1974 97.6 13.0 426.8 154
10 The Spy Who Loved Me 1977 187.3 28.0 666.4 208
11 Moonraker 1979 210.3 34.0 624.5 188
12 For Your Eyes Only 1981 202.8 28.0 481.0 183
13 Octopussy 1983 187.5 27.5 405.9 182
14 A View to a Kill 1985 157.8 30.0 316.2 214
15 The Living Daylights 1987 191.2 40.0 362.9 283
16 Licence to Kill 1989 156.2 32.0 271.6 251
17 GoldenEye 1995 353.4 60.0 500.0 209
18 Tomorrow Never Dies 1997 346.6 110.0 465.6 292
19 The World Is Not Enough 1999 390.0 135.0 504.7 237
20 Die Another Day 2002 456.0 142.0 546.5 276
21 Casino Royale 2006 599.2 150.0 640.8 241
22 Quantum of Solace 2008 586.1 230.0 586.1 263
23 Skyfall 2012 ? 150.0 ? 286
a) How many cases are included in this data set?
b) How many variables?
c) List each variable and classify it according to type (categorical, ordinal, quantitative, identifier).
d) Create an appropriate graphical display to investigate a possible relationship between year of release and total box office gross. (You'll need to omit Skyfall, which will not premiere until October 23, 2012.)
e) Which variable was the explanatory variable in your display?
f) Which was the response variable?
g) Describe any association you see in this display.
h) Convert the years to "years since Dr. No" (so that Skyfall, for example, would be 50 years after Dr. No) and create a similar graphical display. Does the strength of the association change?
i) If appropriate, compute the correlation between year and total box office gross? (If not appropriate, explain.)
j) If you computed the correlation between the years since Dr. No (instead of year) and total box office in dollars (instead of millions of dollars) would the correlation change?
k) If appropriate, find a linear regression model for the association between year and total box office gross? (If not appropriate, explain.)
8. James Bond will return Refer to the data set from the previous problem.
a) Create an appropriate graphical display to investigate total box office gross and budget. (You'll need to omit Skyfall, which will not premiere until October 23, 2012.)
b) Which variable was the explanatory variable in your display?
c) Which was the response variable?
d) Describe any association you see in this display.
e) If appropriate, compute the correlation between total box office gross and budget. (If not appropriate, explain.)
f) If you computed the correlation, compute r2 and use a complete sentence to explain its meaning.
k) If appropriate, find a linear regression model for the association between total box office gross and budget. (If not appropriate, explain.)
l) If appropriate, predict the total box office gross for Skyfall, which has a reported budget of $150,000,000. (If not appropriate, explain.)
m) If a linear model is appropriate, compute the residual for Goldeneye.
9. Connery, Lazenby, Moore, Dalton, Brosnan, Craig Over the past 50 years, six actors have portrayed Agent 007 in the 23 Eon James Bond films: Sean Connery, George Lazenby, Roger Moore, Timothy Dalton, Pierce Brosnan and Daniel Craig. The longest gap between Bond films was between 1989 (Dalton's second—and final—appearance as Bond in Licence to Kill) and 1995 (Pierce Brosnan's debut as 007 in GoldenEye). Refer to the data set used in the previous two exercises.
a) Create a scatterplot to investigate a possible association between year of release and adjusted box office revenue.
b) Describe any association you observe in the scatterplot.
c) Is a linear model appropriate for this association?
d) Would a linear model be appropriate for the pre-GoldenEye Bond films? If so, find the equation of the regression line.
e) If appropriate, use your model from part d to predict the adjusted box office revenue for a Bond film released in 1976, had one been made that year.
f) If appropriate, use your model from part d to predict the adjusted box office revenue for a Bond film released in 1991, had one been made that year.
g) If appropriate, use your model from part d to predict the adjusted box office revenue for Skyfall.
h) Would a linear model be appropriate for the post-Licence to Kill Bond films? If so, find the equation of the regression line.
i) If appropriate, use your model from part h to predict the adjusted box office revenue for Skyfall.
10. This is the end At 00:07 BST on October 5, 2012, British singer Adele released "Skyfall," the title song to the newest James Bond film, on her Web site. Within 10 hours, became the number one iTunes download, and later debuted at No. 3 on the Billboard Hot 100 chart. Refer to the data set employed in the previous three problems.
a) Create a scatterplot to investigate a possible association between year of release and duration of title song.
b) Which variable was the explanatory variable in your display?
c) Which was the response variable?
d) Describe any association you see in this display.
e) If appropriate, compute the correlation between these variables. (If not appropriate, explain.)
f) Dr. No, the first James Bond film, did not feature a title song (the data set lists the duration for "The James Bond Theme," an instrumental credited to Monty Norman and arranged by John Barry, as performed by The John Barry Orchestra) and On Her Majesty's Secret Service featured a John Barry instrumental track for the opening credits (the data set lists the duration for "We Have All the Time in the World," written by Barry and Hal David, which Louis Armstrong sings over the film's end titles). Do either of these songs appear to be outliers? If so, is there a legitimate reason for omitting them from the data set?
g) If you identified any songs as outliers in part f, and had a legitimate reason to omit them, do so and construct another scatterplot.
h) If appropriate, compute the correlation now. (If not appropriate, explain.)
k) If appropriate, find a linear regression model for the association between year and song duration. (If not appropriate, explain.)
l) If appropriate, predict the duration for a James Bond title song released in 1993, had a 007 film been released that year. (If not appropriate, explain.)
m) If a linear model is appropriate, compute the residual for "The Living Daylights," the title song to the film of the same name, written by Paul Waaktaar-Savoy and John Barry, and performed by A-ha.
11. Incorrect statements Sometimes statistics students write down inaccurate or incorrect interpretations of regression analyses and correlation computations. For each statement below, explain what is inaccurate or incorrect and then rewrite the statement to fix the problem.
a) A correlation of r = 0.96 between two variables tells us that a linear model is appropriate for those variables.
b) An r2 value of 0.81 means that 81% of the response variable is explained by the explanatory variable.
c) A regression equation of `hat(y) = 56.8 + 3.59x` means that each increase of 1 unit in the explanatory variable causes the response variable to increase by 3.59 units.
d) A correlation of r = -0.73 is weaker than a correlation of r = 0.56.
e) A correlation of r = 0.12 between two variables tells us that a linear model is not appropriate for those variables.
12. Woodway real estate The Northwest Multiple Listing Service (NWMLS) operates a database of homes and property for sale throughout Washington State and provides this information to realtors and real estate Web sites. The table below includes the street address, year built, number of bedrooms (BR), number of bathrooms (BA), size (in square feet) and asking price (in thousands of dollars) for the ten houses located in the city of Woodway listed for sale on NWMLS on October 16, 2012.
address year BR BA size price
24323 Timber Lane 1923 4 3 3000 729
22109 Woodway Park Rd 1950 4 2 2430 769
23407 Woodway Park Rd 1921 3 2 4625 995
11402 239th Pl SW 1962 5 3 2868 800
23503 Timber Lane 1973 4 3 4231 1100
23920 115th Pl W 2000 4 5 4577 1025
11312 S Dogwood Lane 1940 5 6 6527 1650
22714 106th Ave W 2003 6 7 7746 1750
22505 Woodway Park Rd 1960 4 4 4339 1840
24120 114th Ave W 1964 5 3 3468 800
a) Create an appropriate graphical display to investigate a possible association between size of these houses and their asking prices.
b) Describe the association you see in the display, noting any unusual features.
c) Is it appropriate to compute the correlation for size and price? Explain.
d) Is the house at 22505 Woodway Park Road an outlier?
e) If you answered yes to the previous question, does that house have high leverage?
f) If you answered yes to part d, is that house an influential point?
g) Create an appropriate graphical display to investigate a possible association between the year each house was built and its size.
h) Which variable is most likely to be the explanatory variable and which the response variable in this relationship? Explain.
h) Describe any association you see in the display, noting any unusual features.
i) Is it appropriate to compute the correlation for year and size? Explain.
13. Return to Woodway In the data set from the previous problem, the property at 22505 Woodway Park Road also includes a guest house, whose size may not be included in the 4,339 square feet listed for this property.
a) Does the information provided above constitute a legitimate reason for omitting this house from further analysis of this data set? Explain.
b) If we omit this house from the data set, would it be appropriate to compute the correlation between size and price?
c) Compute the correlation between size and price with the property at 22505 Woodway Park Road omitted.
d) Compute r2 and use a complete sentence to explain the meaning of this number in the context of the remaining nine houses.
e) If we omit the house at 22505 Woodway Park Road, would it be appropriate to construct a linear model for the association between size and price? Explain.
f) Find the regression equation relating size and price. Use proper notation.
g) If appropriate, predict the asking price for a 5,300 sq. ft. home in Woodway.
h) If appropriate, predict the asking price for a 1,300 sq. ft. home in Woodway.
i) Compute the residual for the house at 24323 Timber Lane.
14. [EESEE] BAC A study conducted during February 1986 in a student dormitory at Ohio State University had 16 student volunteers blow into a breathalyzer to verify their blood alcohol content (BAC) was 0. They each then drew a number (from 1 though 9) from a bowl and drank that number of beers. Thirty minutes after drinking their last beer, an OSU police officer measured their BAC (in g/dl). The data for these 16 students appears below.
student 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
beers 5 2 9 8 3 7 3 5 3 5 4 6 5 7 1 4
BAC 0.10 0.03 0.19 0.12 0.04 0.095 0.07 0.06 0.02 0.05 0.07 0.10 0.085 0.09 0.01 0.05
a) How many cases are included in this data set?
b) How many variables are included in this data set?
c) Which variable is the explanatory variable in this study?
d) Which variable is the response variable in this study?
e) Should you compute the correlation between student and beers? Explain.
f) Construct an appropriate graphical display to investigate the association being studied.
g) Describe the association apparent in your display.
h) Should you compute the correlation between the two variables used in your display? Explain.
i) If appropriate, compute the correlation.
j) If you computed the correlation, compute r2 and explain its meaning in the context of this study.
k) If a student drank a number of beers 1.5 SDs above average, what would you predict about his or her BAC?
15. Another round Refer to the data set from the previous problem.
a) Create a graphical display to investigate the association between number of beers consumed and BAC.
b) Is it appropriate to use a linear regression equation to model the association apparent in the scatterplot? Explain. (If not appropriate, stop here.)
c) Find the regression equation for this data.
d) If appropriate, use your equation to predict the BAC of a student who consumed 14.5 beers. (If not appropriate, explain.)
e) If appropriate, use your equation to predict the BAC of a student who consumed 4.5 beers. (If not appropriate, explain.)
f) Compute the residual for the student who consumed 6 beers.
g) Explain the meaning of the slope in the context of this study, or explain why the slope is meaningless.
h) Explain the meaning of the intercept in the context of this study, or explain why the intercept is meaningless.
i) Construct a residuals plot for this data.
j) Does the residuals plot confirm the appropriateness of a linear model for this data? Explain.
16. Florida 2000 The U.S. presidential election in the year 2000 resulted in a disputed outcome in the state of Florida, which led to a case before the U.S. Supreme Court (Bush v. Gore) that ultimately decided the winner of the presidency. This tab-delimited text file (which you should be able to open in any text editor or spreadsheet program such as Excel or Google Docs) contains vote totals for George W. Bush (the Republican candidate), Al Gore (the Democratic candidate), Ralph Nader (the Green Party Candidate) and Pat Buchanan (the Reform Party candidate).
a) How many cases are in this data set?
b) How many variables are in this data set?
c) Create an appropriate graphical display to investigate a possible association between the number of votes received by Bush and the number received by Gore.
d) Describe any association you see in the display.
e) Does this association mean that increased turnout by Bush voters caused a corresponding turnout by Gore voters?
f) Should you compute the correlation between Bush votes and Gore votes? Explain.
g) Create an appropriate graphical display to investigate a possible association between the number of votes received by Nader and the number received by Buchanan.
h) Describe any association you see in the display, taking care to mention any unusual features.
i) Should you compute the correlation between Nader votes and Buchanan votes? Explain.
17. Facebook A statistics student who reported she used Facebook "almost every day" wanted to know if use of a social network such as Facebook had an effect on college students' GPAs. On June 8, 2012, she asked two people seated at each table in the student lounge to provided answers to three questions: how many hours they spent on Facebook each week, their college GPA and whether or not they felt use of social media had an effect on their grades. The data she collected appears below.
hours GPA effect
7 3.5 yes
14 3.8 yes
28 3.0 yes
16 3.0 yes
21 2.8 yes
40 3.6 yes
9 3.8 yes
15 3.9 no
40 2.0 no
21 3.4 yes
20 3.7 yes
20 3.2 yes
20 2.9 yes
11 3.8 no
15 3.8 yes
8 3.5 yes
3 3.1 yes
13 3.9 no
2 3.8 yes
5 3.94 yes
1 3.8 no
20 3.0 yes
14 3.1 yes
7 3.3 yes
14 3.2 yes
18 3.0 yes
20 3.0 yes
25 3.6 yes
15 3.2 yes
20 3.1 yes
1 3.2 no
2 1.9 no
5 2.5 no
28 3.0 yes
24 2.14 no
1 2.5 no
41 3.3 yes
16 2.5 yes
8 2.8 yes
21 3.8 no
a) How many cases are included in this data set?
b) How many variables are included in this data set? Classify each as categorical, quantitative, ordinal or identifier.
c) Which of these variables might be considered explanatory?
d) Which might be considered response variables?
e) Create an appropriate graphical display to investigate a possible association between use of social media and GPA.
f) Describe any association evident in your display. Be sure to mention any unusual features.
g) Would it be appropriate to compute the correlation between the two variables in your display? Explain.
h) Construct graphical displays using only those who "answered" yes to the third question and only those who answered "no." Are any new associations evident? Would it be appropriate to compute the correlation for the variables in each of these new displays?
18. Track and field A statistics student who participated in the triple jump collected data about the the top 15 female athletes (including herself), including their nationally qualified marks (measured in meters) compared to the finishing marks they jumped at the national meet. Some of this data, from the 2011-2012 indoor season, appears below.
surname pre meet
Ouedraogo 12.92 12.84
Zweifel 12.51 12.22
Yingling 12.41 12.03
Danville 12.29 12.10
Hewett 12.24 12.13
Segbor 12.14 12.13
Potter 12.06 12.40
Wyatt 12.05 11.75
Boyd 12.04 11.78
McDaniel 12.03 12.24
Bowens 12.01 11.91
Yates 12.00 11.87
Bemis 11.99 12.20
Bourne 11.95 11.62
Schmidt 11.95 11.91
a) Construct an appropriate graphical display to investigate an association between the pre-meet qualifying distances and the distances jumped at the national meet.
b) Describe any association evident in your display, and mention any unusual features.
c) Would it be appropriate to compute the correlation between the variables in your display? Explain.
d) Is there a significant outlier in the display? If so, do you have a legitimate reason to omit this outlier from your analysis?
e) If you did omit the outlier, would you expect the correlation to be weaker or stronger after omitting the outlier.
f) Compute the correlation with and without the outlier and compare your results to what you expected to find in part e.
g) Compute the differences between qualifying distance and meet distance for each athlete.
h) Construct a graphical display of those differences.
i) Is there a significant outlier in this display?
19. Subaru Outbacks The Web site cars.com contains listings for used cars from throughout the United States. The table below contains information for the 16 Subaru Outback automobiles listed for sale under $15,000 within 10 miles of Lynnwood, Washington, as of October 29, 2012. This information contains the Vehicle Identication Number (VIN), color, model year, mileage (in miles) and price (in dollars).
VIN color year mileage price
4S4BP67C264323292 red 2006 96756 14995
4S4BP61C976312854 silver 2007 91920 13991
4S4BP61C657387115 white 2005 129118 11991
4S4BP61CX67355012 silver 2006 92355 10995
4S3BH686747631189 white 2004 106000 10988
4S4BP62C257301801 silver 2005 150596 9995
4S3BH675537622627 silver 2003 107779 9991
4S3BH686227664422 black cherry 2002 102603 9400
4S3BH806627663958 white 2002 140937 8995
4S3BH6865Y6672979 white 2000 144088 8787
4S3BE686147200243 green 2004 173993 8398
4S3BH6651Y6659124 green 2000 126872 7999
4S3BH665826621154 red 2002 142082 7998
4S3BH806827621324 white 2002 107627 7991
4S3BH675327658539 burgundy 2002 115234 7990
4S3BH686417636362 black 2001 142432 6900
a) Create a scatterplot to investigate a possible association between mileage and price.
b) Which variable did you select as the explanatory variable when creating your graphical display?
c) Which variable did you select as the response variable?
d) Is it appropriate to compute the correlation between mileage and price? Explain.
e) Compute the correlation.
f) Compute R2 and explain what it means in the context of these automobiles.
g) If you switched the explanatory and response variables, would the correlation change? Explain.
20. Outbacks again Refer to the data set from the previous problem.
a) Create a scatterplot to investigate an association between the mileage and price of these 16 automobiles.
b) Is it appropriate to construct a linear regression model for the association between mileage and price? Explain.
c) Find the linear regression equation that you could use to predict price based upon mileage for Subaru Outbacks.
d) What is the slope of the regression equation?
e) What are the units for the slope?
f) Use a complete sentence or two to explain the meaning of the slope in the context of Subaru Outbacks.
g) What is the intercept of the regression equation?
h) What are the units for the intercept?
i) Use a complete sentence or two to explain the meaning of the intercept in the context of Subaru Outbacks, or explain why such an interpretation of the intercept is meaningless.
j) If appropriate, use the model you constructed to predict the price of a Subaru Outback with 112,000 miles.
k) If appropriate, use the model you constructed to predict the price of a Subaru Outback with 12,000 miles.
21. Outback outliers Refer to the data set used in the previous two exercises. The Web site actually listed one more Subaru Outback for sale in Lynnwood:
VIN color year mileage price
4S3BH6656Y7635735 red 2000 42712 9995
a) Construct a graphical display of mileage vs. price that includes this Outback with the other 16 cars.
b) If you included this Outback in the graphical display with the other 16, would it still be appropriate to compute the correlation? Explain.
c) If you included this Outback in the graphical display with the other 16, would it still be appropriate to construct a linear regression model to predict price based upon mileage? Explain.
d) Would you consider this car to be an outlier? Explain.
e) Would you consider this car to be a high-leverage point? Explain.
f) Would you consider this car to be influential? Explain.
g) This car was the only one among the 17 Outbacks listed for sale in Lynnwood that spent most of its time in Hawaii before being sold and shipped to the mainland for resale. Would this information about the car be sufficient to omit it from the full data set before constructing a linear regression model? Explain.
22. Outback residuals Refer to the data set from #19.
a) Construct a linear regression model to predict price based upon mileage.
b) Compute the residual for the black cherry Outback.
c) If you were selling an Outback, would you prefer that your car have a positive residual or a negative residual? Explain.
d) If you were buying an Outback, would you prefer that your car have a positive residual or a negative residual? Explain.
e) Construct a residuals plot for this data set.
f) What does the residuals plot tell you about the appropriateness of a linear model?
23. Ford Focus The Ford Focus is a compact car introduced to North America in 1999 for model year 2000. The table below shows the model year, mileage and asking price for all 14 used Ford Focus automobiles advertised for sale on the Web site of The Seattle Times on January 31, 2010.
year mileage price
2007 25426 14595
2008 49223 13991
2008 49028 13991
2008 27690 11994
2008 36216 11980
2002 71646 10991
2007 41107 9671
2002 83454 8991
2007 49443 7988
2007 34179 7499
2002 63439 7475
2005 43012 5400
2001 86681 4494
2002 113000 2000
a) Create a scatterplot to investigate a possible association between mileage and price.
b) Which variable did you select as the explanatory variable when creating your graphical display?
c) Which variable did you select as the response variable?
d) Is it appropriate to compute the correlation between mileage and price? Explain.
e) Compute the correlation.
f) Compute R2 and explain what it means in the context of these automobiles.
g) If you switched the explanatory and response variables, would the correlation change? Explain.
h) Compute the correlation between age and price.
i) Which association is stronger: mileage and price, or age and price? Explain.
j) What should you have done before computing the correlation between age and price in part h?
24. Focuses (Foci?) again Refer to the data set from the previous problem.
a) Create a scatterplot to investigate an association between the mileage and price of these automobiles.
b) Is it appropriate to construct a linear regression model for the association between mileage and price? Explain.
c) Find the linear regression equation that you could use to predict price based upon mileage.
d) What is the slope of the regression equation?
e) What are the units for the slope?
f) Use a complete sentence or two to explain the meaning of the slope in the context of these automobiles.
g) What is the intercept of the regression equation?
h) What are the units for the intercept?
i) Use a complete sentence or two to explain the meaning of the intercept in the context of these automobiles, or explain why such an interpretation of the intercept is meaningless.
j) If appropriate, use the model you constructed to predict the price of a Ford Focus with 135,000 miles.
k) If appropriate, use the model you constructed to predict the price of a Ford Focus with 35,000 miles.
25. Focus residuals Refer to the data set from the previous two problems.
a) Construct a linear regression model to predict price based upon mileage.
b) Compute the residual for the 2005 Ford Focus.
c) If you were buying a Ford Focus, would you prefer that your car have a positive residual or a negative residual? Explain.
d) If you were selling a Ford Focus, would you prefer that your car have a positive residual or a negative residual? Explain.
e) Construct a residuals plot for this data set.
f) What does the residuals plot tell you about the appropriateness of a linear model?
26. Honda Odyssey The following data includes the year, make, model, mileage (in thousands of miles) and asking price (in US dollars) for each of 13 used Honda Odyssey minivansadvertised for sale on the Web site of the Seattle P-I on April 25, 2005.
year make model mileage price
2004 Honda Odyssey EXL 20 26900
2004 Honda Odyssey EX 21 23000
2002 Honda Odyssey 33 17500
2002 Honda Odyssey 41 18999
2001 Honda Odyssey EX 43 17200
2001 Honda Odyssey EX 67 18995
2000 Honda Odyssey LX 46 13900
2000 Honda Odyssey EX 72 15250
2000 Honda Odyssey EX 82 13200
2000 Honda Odyssey 99 11000
1999 Honda Odyssey 71 13900
1998 Honda Odyssey 85 8350
1995 Honda Odyssey EX 100 5800
a) Create a scatterplot to investigate a possible association between mileage and price.
b) Which variable did you select as the explanatory variable when creating your graphical display?
c) Which variable did you select as the response variable?
d) Is it appropriate to compute the correlation between mileage and price? Explain.
e) Compute the correlation.
f) Compute R2 and explain what it means in the context of these minivans.
g) If you switched the explanatory and response variables, would the correlation change? Explain.
h) Compute the correlation between age and price.
i) Which association is stronger: mileage and price, or age and price? Explain.
j) What should you have done before computing the correlation between age and price in part h?
27. Another Odyssey Refer to the data set from the previous problem.
a) Create a scatterplot to investigate an association between the mileage and price of these minivans.
b) Is it appropriate to construct a linear regression model for the association between mileage and price? Explain.
c) Find the linear regression equation that you could use to predict price based upon mileage.
d) What is the slope of the regression equation?
e) What are the units for the slope?
f) Use a complete sentence or two to explain the meaning of the slope in the context of these minivans.
g) What is the intercept of the regression equation?
h) What are the units for the intercept?
i) Use a complete sentence or two to explain the meaning of the intercept in the context of these minivans, or explain why such an interpretation of the intercept is meaningless.
j) If appropriate, use the model you constructed to predict the price of a Honda Odyssey with 53,000 miles.
k) If appropriate, use the model you constructed to predict the price of a Honda Odyssey with 153,000 miles.
28. Odyssey residuals Refer to the data set from the previous two problems.
a) Construct a linear regression model to predict price based upon mileage.
b) Compute the residual for the 1995 Honda Odyssey.
c) If you were buying a Honda Odyssey, would you prefer that your car have a positive residual or a negative residual? Explain.
d) If you were selling a Honda Odyssey, would you prefer that your car have a positive residual or a negative residual? Explain.
e) Construct a residuals plot for this data set.
f) What does the residuals plot tell you about the appropriateness of a linear model?
29. Odyssey ages Refer to the data set used in the previous three problems.
a) Create a scatterplot to investigate an association between the age and price of these minivans.
b) Is it appropriate to construct a linear regression model for the association between age and price? Explain.
c) Find the linear regression equation that you could use to predict price based upon age.
d) What is the slope of the regression equation?
e) What are the units for the slope?
f) Use a complete sentence or two to explain the meaning of the slope in the context of these minivans.
g) What is the intercept of the regression equation?
h) What are the units for the intercept?
i) Use a complete sentence or two to explain the meaning of the intercept in the context of these minivans, or explain why such an interpretation of the intercept is meaningless.
j) If appropriate, use the model you constructed to predict the price of a 2003 Honda Odyssey.
k) If appropriate, use the model you constructed to predict the price of a 2008 Honda Odyssey.