Review Problems: Week 1

1. Russell Wilson Sportswriters and football fans have expressed surprise since the Seattle Seahawks named Russell Wilson as their starting quarterback at the beginning of the 2012 season, both because he is a rookie recently drafted after playing college football and because at 5 feet 11 inches he is the shortest starting quarterback in the National Football League. This spreadsheet contains the name, team and height (in feet and inches) or each starting quarterback in the NFL as of September 24, 2012. The data was compiled at my request by my 12-year-old son (and resident sports expert), who reported the source for this data as "Google."

a) How many variables are included in this data set?

b) Classify each variable (categorical, quantitative, etc.) and indicate units for quantitative variables.

c) If this data set had included height data for all of the quarterbacks in the NFL and not just the starting quarterback for each team, would you need to change any part of your response from part b)?

d) How many cases are in this data set?

e) Create an appropriate graphical display of this data.

f) Describe the distribution of heights.

g) Does Russell Wilson appear to be an outlier? Explain.

h) Other sources report Russell Wilson's height more exactly as 5 feet 10 5/8 inches, and yet other sources refer to him as being 5 feet 10 inches tall. If you used 5'10'' for his height, would he appear to be an outlier?

i) Compute the mean, median, standard deviation and IQR for the quarterback heights.

j) Which measure of center is most appropriate for this data set?

k) Which measure of variability is most appropriate for this data set?

2. Exam Scores Below are the midterm exam scores from a statistics class.

100 90 80 73 62 50 17 92 89 74 63 51 93 76
 65 52 53 67 98 53 67 98 53 68 99 58 55 58 58

Use these scores to answer the following questions.

a) How many cases are included in this data set?

b) How many variables are included in this data set?

c) Create an appropriate graphical display of the scores.

d) Describe the distribution of the scores.

e) Compute the mean, median, standard deviation and IQR for the exam scores.

f) Which measure of center is most appropriate for this data set?

g) Which measure of variability is most appropriate for this data set?

h) If the student who received a score of 17 dropped the class and was omitted from the data set, what would happen to the:

i) mean score?

ii) median score?

iii) IQR?

iv) standard deviation?

3. Dead Presidents Gerald R. Ford passed away at the age of 93 in 2006. At the time of his death he was the longest-living U.S. president. Find the ages of all 38 deceased presidents (from a reference book or a reliable online source).

a) Create an appropriate graphical display of the ages.

b) Describe the distribution of the ages.

c) Was Gerald Ford unusual?

d) Report an appropriate measure of center and an appropriate measure of variability for the ages.

e) What important variable is not apparent in your graphical display?

4. [OIS 1.37] Histograms and Boxplots Consider the three histograms and three boxplots shown here:

a) Describe the distribution in histogram (a).

b) Describe the distribution in histogram (b).

c) Describe the distribution in histogram (c).

d) Describe the distribution in boxplot (1).

e) Describe the distribution in boxplot (2).

f) Describe the distribution in boxplot (3).

g) Each of the three boxplots graphs the same data as one of the three histograms. Match each histogram with its corresponding boxplot.

h) What (if any) features are apparent in the histograms but not in the boxplots?

i) What (if any) features are apparent in the boxplots but not in the histograms?

j) Estimate the median for the data sets in each of the boxplots.

k) Estimate the IQR for the data sets in each of the boxplots.

l) Estimate the range for the data sets in each of the boxplots.

m) For histogram (c), would the mean be bigger than the median, or smaller? (Or about the same?)

n) For histogram (a), would the mean be bigger than the median, or smaller? (Or about the same?)

5. [OIS 1.38] Air quality Daily air quality is measured by the air quality index (AQI) reported by the Environmental Protection Agency. This index reports the pollution level and what associated health effects might be a concern. The index is calculated for five major air pollutants regulated by the Clean Air Act and takes values from 0 to 300, where a higher value indicates lower air quality. AQI was reported for a sample of 91 days in 2011 in Durham, NC. The relative frequency histogram below shows the distribution of the AQI values on these days.

a) Describe the distribution.

b) Estimate the median AQI.

c) Would you expect the mean to be higher or lower than the median? Explain.

d) Estimate the IQR of the AQI.

6. [CNX 1.15] Distance Learning During the 2010–2011 academic year, 771 distance learning students at Long Beach City College responded to a survey; highlights of the summary report appear in the table below.

Have computer at home 96%
Unable to come to campus for classes 65%
Age 41 or over 24%
Would like LBCC to offer more DL courses 95%
Took DL classes due to a disability 17%
Live at least 16 miles from campus 13%
Took DL courses to fulfill transfer requirements 71%

a) What percentage of the students surveyed live less than 16 miles from campus?

b) About how many students who participated in the survey live at least 16 miles from campus?

c) Would a pie chart be appropriate for this data? Explain.

d) Create an appropriate graphical display of this data.

e) If possible, compute the percentage of students surveyed who are unable to come to campus or live at least 16 miles from campus. (If not, possible explain.)

f) If possible, compute the percentage of students surveyed who are unable to come to campus or took DL courses to fulfill transfer requirements. (If not, possible explain.)

7. Guessing Ages On the first day of the quarter, 37 students in a statistics class took a survey in which they were asked to guess the age of their instructor. These guesses are given below:

47 43 50 38 37 38 37 42 42 41 
40 37 35 33 34 45 45 43 38 39 
37 32 40 30 32 45 46 35 40 40 
33 42 42 40 46 40 35

a) How many cases are in this data set?

b) How many variables are in this data set? (Classify the type of each variable.)

c) Create an appropriate graphical display for this data.

d) Describe the distribution of guesses.

e) Compute the mean, median, standard deviation and IQR for the guesses.

f) Which summary statistics would be most appropriate to summarize the center and variability of the data?

g) If the instructor had asked the students to guess the age of his 5-year-old son, would the standard deviation of guess about his son be smaller or larger than the standard deviation of guesses about the instructor's age?

h) One other student, not listed above, misread the instructions and answered with his or her own age (20) rather than guessing the instructor's age. If this student's response were included in the data set, what would happen to the:

i) mean

ii) median

iii) standard deviation

iv) IQR 

8. Presidential Race 2012 Anderson Robbins Research and Shaw & Company Research conducted a poll for Fox News during the period of September 24–26, 2012, asking each respondent, "If the presidential election were held today, how would you vote if the candidates were Democrats Barack Obama and Joe Biden, and Republicans Mitt Romney and Paul Ryan?" If the respondent did not select one of these choices initially, the interviewer followed up with: "Well, which way are you leaning?" Of 1,092 likely voters surveyed nationwide, 48% answered "Obama/Biden" and 43% answered "Romney/Ryan."

a) Based on the information provided above, how many cases are in the data set?

b) How many variables? (Specify a type for each.)

c) Approximately how many of the voters surveyed expressed a preference for Romney and Ryan? 

d) Create an appropriate graphical display for this data.

9. Swing States The Marist Poll organization conducted surveys of voters in New Hampshire, Nevada and North Carolina on behalf of NBC News and the Wall Street Journal during the period of September 23–25, 2012. NBC News published an article about the poll results on their Web site, along with PDF documents listing all of the questions asked by the interviewers in each state along with summary statistics for the responses. Download the New Hampshire document and use it to answer the following questions. (The document also contains some information about a previous poll conducted during June 2012; ignore any mention of that earlier poll for the purposes of this exercise.)

a) If possible, determine the number of cases in the New Hampshire data set.

b) List all of the variables in the New Hampshire data set and classify each as categorical, quantitative (be sure to include units), ordinal or identifier.

10. Mac vs. PC A statistics student wondered whether computer shoppers paid more for an Apple computer or a similarly equipped PC. To do this, he searched completed listings for used computers on eBay on December 9, 2009, then randomly selected 11 Apple computers with Intel-based processors. For each of these 11 computers, he examined the configuration (hard drive size, monitor size, CPU speed) and randomly selected a PC with similar configuration. The resulting data set, which included the selling price of an Apple and a PC for 11 different configurations, appears in the following table:

config. Apple PC
A       630   310
B       620   330
C       909   325
D       860   500
E       1359  537
F       899   1099
G       599   475
H       899   394
J       915   389
K       545   422
L       620   585

a) How many cases and variables are represented in this data set? Speciffy the type of each variable (including units for quantitative variables).

b) Construct an appropriate graphical display for the Apple prices and another for the PC prices.

c) Based on your displays, does it appear that computer shoppers pay more for Apple computers or PCs?

d) Compute appropriate summary statistics for the Apple and PC prices.

e) Would it make more sense to compare the Apple prices to the PC prices with two separate displays and sets of summary statistics, or would it make more sense to investigate the price differences for each configuration? Explain.

f) Compute the price differences, create an appropriate graphical display of those differences, and compute appropriate summary statistics.

11. Real Estate Craigslist is a Web site that allows users to post online classified advertisements at no charge (with the exception of job postings in certain metropolitan areas). The table below shows the street address, size (in square feet), asking price (in dollars) and number of bedrooms (abbreviated BR) for nine houses located in the city of Lynnwood listed for sale on Craigslist on October 9, 2011.

address            size  price   BR
18712 57th Ave W   1805  349950  3
19014 24th Ave W   2404  329900  5
3203A 204th St SW  1912  250000  4
17112 6th Ave W    3200  509797  4
19631 9th Pl W     2369  339950  4
17402 62nd Ave W   1200  185000  3
21011 54th Ave W   1660  136950  3
14018 20th Pl W    2244  254950  4
14517 40 Ave W     2450  335000  3

a) How many cases are in this data set?

b) How many variables? List each variable and classify it as categorical, quantitative, ordinal or identifier.

c) Create an appropriate graphical display of the house sizes.

d) Create an appropriate graphical display of the house prices.

e) Compute appropriate summary statistics for the house sizes.

f) Compute appropriate summary statistics for the house prices.

g) If the price of the house on 6th Avenue West was reduced to $499,000, how would that change affect the:

i) mean price?

ii) median price?

iii) IQR?

iv) standard deviation?

v) range?

12. Laundry Detergent An article in the November 2011 issue of Consumer Reports compares the price (in cents per load) and performance (on a scale from 0 to 100) of 34 brands of high-efficiency laundry detergent (used in front-loading machines) and 24 brands of conventional laundry detergent (used in top-loading machines). A graphical display of the price data for the 34 high-efficiency detergents appears below.

a) How many cases are included in the complete data set?

b) List all of the variables included in the complete data set and classify each as categorical, quantitative, ordinal or identifier.

c) What term best describes the graphical display shown above?

d) Describe the describe the distribution of the price variable.

e) Which summary statistics would be most appropriate to report for the price variable?

f) Estimate the median price.

g) Estimate the IQR of the prices.

h) Estimate the range of the prices.

13. Laundry Detergent (Second Load) An article in the November 2011 issue of Consumer Reports compares the price (in cents per load) and performance (on a scale from 0 to 100) of 34 brands of high-efficiency laundry detergent (used in front-loading machines) and 24 brands of conventional laundry detergent (used in top-loading machines). A graphical display of the price data for the 24 conventional detergents appears below.

3|0
2|
2|223
1|556678
1|1123334
0|56789

Key: 3|0 = 30 cents per load

a) What term best describes the graphical display shown above?

b) Describe the describe the distribution of the price variable seen in the graphical display.

c) Compute the median price.

d) Compute the IQR of the prices.

e) Compute the range of the prices.

f) Compute the mean price.

g) Computer the standard deviation of the prices.

h) Which summary statistics would be most appropriate to report for the price variable?

i) Which of the following terms best applies to the detergent that costs 30 cents per load?

14. Exercise A statistics student wondered whether males spend more time exercising on a treadmill at the gym than females. She went to several different gyms on a single day in order to observe 20 males (denoted "m" below) and 20 females ("f") using a treadmill and record the time on the display when they finished running.

52 f   35 f   56 m   25 f
22 m   15 f   41 m   10 f
29 m   38 m    5 m   25 f
55 f   21 m   13 m   48 m
20 f   33 m   48 f   31 m
18 f   10 m   52 m   10 f
 5 f   63 m   60 m   60 f
33 m    5 f   40 m   25 f
33 f   26 m   30 m    5 f
30 m   40 f   14 f   35 f

a) How many cases are included in this data set?

b) List the variables included in this data set and classify each as categorical, quantitative, ordinal or identifier.

c) Create an appropriate graphical display for the male running times.

d) Create an appropriate graphical display for the female running times.

e) Describe and compare the distribution for each group.

f) Compute appropriate summary statistics for each group.

15. Coffee An Edmonds Community College student wanted to investigate whether there is a difference in the number of calories between comparable coffee drinks at Starbucks and Seattle's Best Coffee (SBC), two major coffee retailers. She found caloric information about 10 different drinks and created the table shown below:

drink                  Starbucks  SBC
Cappucino              120        130
Latte                  190        160
Vanilla Latte          250        240
Iced Latte             130        140
Mocha                  260        390
White Chocolate Mocha  400        310
Peppermint Mocha       330        410
Iced Mocha             200        240
Hot Chocolate          300        320
Chai Tea Latte         240        230

a) How many cases are in this data set?

b) List the variables in this data set and classify each as categorical, quantitative, ordinal or identifier.

c) Create appropriate graphical display(s) to help investigate the answer to the student's question and describe the resulting distribution(s).

d) Compute appropriate summary statistics to accompany your graphical display.

e) Do you think there is a significant difference? Explain.

16. Washing Machines The February 2010 issue of Consumer Reports ranked 76 different washing machine models (43 front-loading machines and 33 top-loading machines) based on a variety of considerations. The article included two tables, one for front-loaders and one for top-loaders. Each table listed: the rank of each machine; its make; model; estimated retail price (in dollars); overall Consumer Reports score (on a scale from 0 to 100); test results for washing, energy efficiency, water efficiency, capacity, gentleness, noise and vibration (each rated as Poor, Fair, Good, Very Good or Excellent); and cycle time (in minutes). Data for the first five machines appears below:

a) How many cases were included in the complete data set?

b) List the variables included in the data set, and classify each as categorical, quantitative, ordinal or identifier.

A graphical display of the cycle times for the 33 top-loading machines appears below; use it to answer the next seven questions.

c) What term best describes this type of graphical display?

d) Describe the distribution of cycle times as they appear in the graphical display.

e) What term best applies to the Kenmore Elite Oasis 2808, with a cycle time of 85 minutes?

f) Which measure of center would be more appropriate to report for the cycle time data?

g) Which measure of spread would be more appropriate to report for the cycle time data?

h) Estimate the median cycle time.

i) Would you expect the mean cycle time to be smaller than the median, larger, or about the same?

17. Washing Machines (Second Load) The February 2010 issue of Consumer Reports ranked 76 different washing machine models (43 front-loading machines and 33 top-loading machines) based on a variety of considerations. The article included two tables, one for front-loaders and one for top-loaders. Each table listed: the rank of each machine; its make; model; estimated retail price (in dollars); overall Consumer Reports score (on a scale from 0 to 100); test results for washing, energy efficiency, water efficiency, capacity, gentleness, noise and vibration (each rated as Poor, Fair, Good, Very Good or Excellent); and cycle time (in minutes). The cycle times for the front-loading machines appear below:

 80  90  65  55  55  70  55  95 105  70 100  75  55  65 100 
 
85  55  85  70  70  90  70  90  90  60  65 105  60  50 115 
 
70  95  95  70 100  80  95  65 100  65  70  90 100

a) Create an appropriate graphical display for this data.

b) Describe the distribution of cycle times based on your graphical display.

c) Compute the median cycle time.

d) Compute the mean cycle time.

e) Compute the range of cycle times.

f) Compute the IQR of the cycle times.

g) Compute the standard deviation of the cycle times.

h) Do you notice anything unusual about the data values? What might explain this?

18. Toyota Prius The Prius is a popular gas-electric hybrid car manufactured by Toyota. The table below shows the VIN (vehicle identification number), color, age (in years), mileage (in miles) and asking price (in dollars) for 13 used Toyota Prius automobiles advertised for sale on the Web site of The Seattle Times on January 23, 2011.

VIN               age  color  mileage  price
JTDKN3DU9A0056349   1  black     9277  28995
JTDKN3DU8A0165157   1  black     4180  28995
JTDKN3DU1A0057303   1  blue     32105  25995
JTDKN3DU9A0147198   1  gray      8129  24995
JTDKB20U197821193   2  pewter   28434  23995
JTDKB20U683348798   3  green    40762  22995
JTDKB20U187716331   3  green    24531  22995
JTDKB20U387727363   3  blue     16262  22995
JTDKN3DU8A0059050   1  silver   32830  21995
JTDKB20U383347267   3  gray     32604  21995
JTDKB20U583417996   3  gray     43827  18995
JTDKB20U697840628   2  white    24632  20995
JTDKB20U297880205   2  white    33651  18680

a) How many cases are in this data set?

b) List the variables and classify each as categorical, quantitative, ordinal or identifier.

c) Construct an appropriate graphical display of the mileage values for these cars.

d) Describe the distribution of mileage values based on your graphical display.

e) Compute the mean mileage.

f) Compute the median mileage.

g) Compute the standard deviation of the mileage values.

h) Compute the IQR of the mileage values.

i) Compute the range of the mileage values.

j) What is the best measure of center to report for mileage?

k) What is the best measure of variability to report for mileage?

l) In addition to the 13 automobiles in the data set provided above, The Seattle Times also listed two significantly older Toyota Prius automobiles:

VIN               age  color  mileage  price
JT2BK12U630070267   8  green    83996  10995
JT2BK18U720060613   9   blue   110919   8995

If we included these two cars in the data set,

i) How would that affect the mean?

ii) How would that affect the standard deviation?

m) If you added the 8- and 9-year old cars to the graphical display, what term would best describe them?

19. Hybrid Accord The Accord is a popular model of car manufactured by Honda. During the model years 2005–2007, Honda produced a gas-electric hybrid version of the Accord that boasted slightly better gas mileage than a traditional gasoline-only Accord. The table below shows the model year, color, mileage (in miles) and asking price (in dollars) for all 12 used Honda Accord hybrid automobiles advertised for sale on the Web site of the Los Angeles Times on October 10, 2010.

year  color        mileage  price
2007  desert mist    34525  21995
2006  white pearl    36929  19999
2007  graphite pearl 57706  18888
2007  white pearl    32924  17888
2006  gold           53567  16998
2007  white          47745  15500
2006  gray           86812  13846
2006  gray           76405  12988
2006  white pearl   141700  11490
2005  white          99938  10000
2005  gold           99267   9987
2005  white         134143   8995

a) How many cases are in this data set?

b) List the variables and classify each as categorical, quantitative, ordinal or identifier.

c) Construct an appropriate graphical display of the mileage values for these cars.

d) Describe the distribution of mileage values based on your graphical display.

e) Compute the mean mileage.

f) Compute the median mileage.

g) Compute the standard deviation of the mileage values.

h) Compute the IQR of the mileage values.

i) Compute the range of the mileage values.

j) What is the best measure of center to report for mileage?

k) What is the best measure of variability to report for mileage?

20. Hybrid Accords (Again) Refer the data set from the previous problem.

a) Construct an appropriate graphical display of the prices for these cars.

b) Describe the distribution of prices based on your graphical display.

c) Compute the mean price.

d) Compute the median price.

e) Compute the standard deviation of the prices.

f) Compute the IQR of the prices.

g) Compute the range of the prices.

h) What is the best measure of center to report for price?

i) What is the best measure of variability to report for price?

j) Consider the mileage and price variables in the data set. Does there appear to be some sort of relationship between the mileage of a car and its price? Explain.

k) Do there appear to be any exceptions to this relationship? Explain.

21.  Liver disease Between 1974 and 1984, the Mayo Clinic collected information about patients with the liver disease primary biliary cirrhosis (PBC). A study of 216 of these patients with this disease found a mean serum albumin level of 34.46 g/l with a standard deviation of 5.84 g/l. A graphical display of these levels appears below:

a) What term best describes the graphical display shown above?

b) Describe the shape of the distribution in this graphical display. Be sure to mention any unusual features. 

c) Would it be more appropriate to report the mean or the median for this data set? Explain.

d) Would it be more appropriate to report the range, standard deviation or IQR for this data set? Explain.

e) Approximately what percentage of people involved in this study has a serum albumin level below 28 g/l?

f) Estimate the median serum albumin level for these 216 patients.

g) Estimate the range of the serum albumin levels for these 216 patients.

h) Estimate the IQR of the serum albumin levels for these 216 patients.

i) Would you expect the mean serum albumin level for these 216 patients to be greater than, less than or about the same as the median?

j) Would it be appropriate to use a stem-and-leaf display to graph this data? Explain. 

22. Left-hand turns On December 2, 2009, a statistics student observed traffic near the corner of 148th and Manor Way in Lynnwood, to investigate whether there is a difference in the duration of time taken to make a left-hand turn between male and female drivers. For 30 drivers (15 male and 15 female), she used a stopwatch to record the gender of each driver and the length of time used to make a left-hand turn. The data she recorded appears below:

male     4.0  3.4  4.3  3.6  2.7  3.8  4.3  3.5  5.3  4.6  3.9  5.1  5.0  4.1  4.8
female   4.7  5.3  5.4  4.1  4.7  3.8  3.1  4.2  4.1  2.9  4.1  4.2  3.7  4.0  4.0

a) How many cases are included in this data set?

b) How many variables are included in this data set?

c) Create a graphical display of the turning times for the male drivers.

d) Describe the distribution of turning times for the male drivers. Be sure to mention any unusual features.

e) Compute the mean turning time for the male drivers.

f) Compute the median turning time for the male drivers.

g) Is it more appropriate to report the mean or the median for the male drivers? Explain. 

h) Compute the range of the turning times for the male drivers.

i) Compute the IQR of the turning times for the male drivers.

j) Compute the standard deviation of the turning times for the male drivers.

k) Is it more appropriate to report the range, IQR or standard deviation for the male drivers? Explain. 

23. More turns Refer to the previous exercise.

a) Create a graphical display of the turning times for the female drivers.

b) Describe the distribution of turning times for the female drivers. Be sure to mention any unusual features.

c) Compute the mean turning time for the female drivers.

d) Compute the median turning time for the female drivers.

e) Is it more appropriate to report the mean or the median for the female drivers? Explain. 

f) Compute the range of the turning times for the female drivers.

g) Compute the IQR of the turning times for the female drivers.

h) Compute the standard deviation of the turning times for the female drivers.

i) Is it more appropriate to report the range, IQR or standard deviation for the female drivers? Explain. 

24. Nerf guns For a class project, a statistics student tested his theory about the regulators found on Nerf guns: that they slow the muzzle velocity of the darts. He collected three Nerf guns that had 22 barrels among them, each barrel individually regulated. He fired a Nerf dart once using each barrel and measured how many inches it traveled, using the same dart on all tests. He then removed the regulators and fired one shot with each of the barrels again. The data he recorded appears below:

barrel  regulator  no regulator
     1        231           252
     2        208           245
     3        202           251
     4        212           265
     5        193           210
     6        201           234
     7        125           155
     8        168           141
     9         38            74
    10        154           231
    11        122           103
    12         77           123
    13        243           262
    14        215           252
    15        239           221
    16        234           268
    17        232           245
    18        237           252
    19        230           254
    20        245           259
    21        218           249
    22        246           262

a) How many variables are included in this data set? (Specify the variables and classify each.)

b) How many cases are included in this data set?

c) Create a graphical display for the distances resulting from firing the gun with a regulator.

d) Describe the distribution of distances resulting from firing the gun with a regulator. Be sure to mention any unusual features.

e) Compute the mean distance for the firings with a regulator.

f) Compute the median distance for the firings with a regulator.

g) Is it more appropriate to report the mean or the median for the firings with a regulator? Explain. 

h) Compute the range of the distances for the firings with a regulator.

i) Compute the IQR of the distances for the firings with a regulator.

j) Compute the standard deviation of the distances for the firings with a regulator.

k) Is it more appropriate to report the range, IQR or standard deviation for distances for the firings with a regulator? Explain. 

25. Nerf II Refer to the data set from the previous problem.

a) Create a graphical display for the distances resulting from firing the gun without a regulator.

b) Describe the distribution of distances resulting from firing the gun without a regulator. Be sure to mention any unusual features.

c) Compute the mean distance for the firings without a regulator.

d) Compute the median distance for the firings without a regulator.

e) Is it more appropriate to report the mean or the median for the firings without a regulator? Explain. 

f) Compute the range of the distances for the firings without a regulator.

g) Compute the IQR of the distances for the firings without a regulator.

h) Compute the standard deviation of the distances for the firings without a regulator.

i) Is it more appropriate to report the range, IQR or standard deviation for distances for the firings without a regulator? Explain. 

26. Inkjet printers For their May 2005 issue, the editors of Consumer Reports compared the cost and effectiveness of a variety of inkjet printers. The following table lists the model, retail price (in dollars) and the text speed (in pages per minute, or ppm) for the 13 top-ranked models.

model                             price  speed
HP Deskjet 6540                     130   11.0
Canon Pixma iP4000                  140   10.0
HP PhotoSmart 7760                  150    6.0
HP Deskjet 5850                     235    6.0
HP PhotoSmart 7960                  230    6.0
HP PhotoSmart 8450                  245    7.0
Canon Pixma iP5000                  190    9.0
Canon Pixma iP2000                   80   10.0
Canon Pixma iP8500                  345    4.5
HP Deskjet 6127                     250    7.0
Lexmark P915 Photo                  135    9.0
Epson Stylus Photo R800             375    2.5
Lexmark Color Jetprinter Z816        90    9.5

a) How many cases are included in this data set?

b) How many variables are included in this data set?

c) Classify each variable as quantitative, categorical ordinal or identifier.

d) Construct an appropriate graphical display for the prices of these 13 printers.

e) Compute the mean price for these 13 printers.

f) Compute the median price for these 13 printers.

g) Compute the IQR of the prices for these 13 printers.

h) Compute the standard deviation of the prices for these 13 printers.

i) Compute the range of the prices for these 13 printers.

j) Which measurement of center should you report for the prices of these 13 printers: mean or median? Explain.

k) Which measurement of variation should you report for the prices of these 13 printers: standard deviation, IQR or range? Explain.

27. Printers again Refer to the data from the previous exercise.

a) Construct an appropriate graphical display for the speeds of these 13 printers.

b) Compute the mean speed for these 13 printers.

c) Compute the median speed for these 13 printers.

d) Compute the IQR of the speeds for these 13 printers.

e) Compute the standard deviation of the speeds for these 13 printers.

f) Compute the range of the speeds for these 13 printers.

g) Which measurement of center should you report for the speeds of these 13 printers: mean or median? Explain.

h) Which measurement of variation should you report for the speeds of these 13 printers: standard deviation, IQR or range? Explain.

28. James Bond The following information about the 23 James Bond films produced by Eon Productions over the past 50 years appears in the table below: production order; year of release; total box office gross (in millions of US dollars); budget (in millions of U.S. dollars); adjusted box office gross (in millions of US dollars, adjusted to the 2008 Consumer Price Index); and the duration (in seconds) of the title song for each film. The information comes from Wikipedia and Amazon.com (for the song duration).

no  title                           year  gross budget adj BO song 
 1  Dr. No                          1962   59.6    1.2  425.5  107 
 2  From Russia With Love           1963   78.9    2.5  555.9  153 
 3  Goldfinger                      1964  124.9    3.5  868.7  168 
 4  Thunderball                     1965  141.2   11.0  966.4  182 
 5  You Only Live Twice             1967  111.6    9.5  720.4  165 
 6  On Her Majesty's Secret Service 1969   87.4    7.0  513.4  193 
 7  Diamonds Are Forever            1971  116.0    7.2  617.5  161 
 8  Live and Let Die                1973  161.8   12.0  785.7  193 
 9  The Man With the Golden Gun     1974   97.6   13.0  426.8  154 
10  The Spy Who Loved Me            1977  187.3   28.0  666.4  208 
11  Moonraker                       1979  210.3   34.0  624.5  188 
12  For Your Eyes Only              1981  202.8   28.0  481.0  183 
13  Octopussy                       1983  187.5   27.5  405.9  182 
14  A View to a Kill                1985  157.8   30.0  316.2  214 
15  The Living Daylights            1987  191.2   40.0  362.9  283 
16  Licence to Kill                 1989  156.2   32.0  271.6  251 
17  GoldenEye                       1995  353.4   60.0  500.0  209 
18  Tomorrow Never Dies             1997  346.6  110.0  465.6  292 
19  The World Is Not Enough         1999  390.0  135.0  504.7  237 
20  Die Another Day                 2002  456.0  142.0  546.5  276 
21  Casino Royale                   2006  599.2  150.0  640.8  241 
22  Quantum of Solace               2008  586.1  230.0  586.1  263 
23  Skyfall                         2012      ?  150.0      ?  286

a) How many cases are included in this data set?

b) How many variables are included in this data set?

c) List each variable and classify it according to type (categorical, ordinal, quantitative, identifier). Be sure to include units for quantitative variables.

d) Create an appropriate graphical display for the song duration variable.

e) Describe the distribution of song durations.

f) Compute the mean song duration.

g) Compute the median song duration.

h) Which would be more appropriate to report: mean or median? Explain.

i) Compute the standard deviation of the song durations.

j) Compute the IQR of the song durations.

k) Compute the range of the song durations. 

l) Which would be most appropriate to report: standard deviation, IQR or range? Explain.

29. James Bond will return Refer to the data set from the previous problem.

a) Create an appropriate graphical display for total box office gross. (You'll need to omit Skyfall, which premiered in the USA on November 9, 2012.)

b) Describe the distribution you see in this display.

c) Compute the mean box office gross.

d) Compute the median box office gross.

e) Which would be more appropriate to report: mean or median? Explain.

f) Compute the standard deviation of the box office grosses.

g) Compute the IQR of the box office grosses.

h) Compute the range of the box office grosses. 

i) Which would be most appropriate to report: standard deviation, IQR or range? Explain.

j) Look up the current box office gross for Skyfall (which is still playing in theaters). If you included Skyfall in the data set,

i) Would it be an outlier? Explain.

ii) Would the mean increase, decrease or stay about the same?

ii) Would the median increase, decrease or stay about the same?

iii) Would the standard deviation increase, decrease or stay about the same?

iv) Would the IQR increase, decrease or stay about the same?

v) Would the range increase, decrease or stay about the same?