
Categorical vs. Quantitative
For an elementary school science fair, a fifth grader constructed a launching platform using a flat board and a rubber band, using it to launch miniature checkers across a room. He launched 50 checkers with the platform raised to an inclination of 15°, and then repeated the process with the platform at 30°, 45°, 60° and 75°. He measured the distance each checker traveled (in cm); the data appears in the table below.
angle dist angle dist angle dist angle dist angle dist
15 260 30 162 45 222 60 510 75 288
15 280 30 245 45 211 60 310 75 330
15 185 30 141 45 190 60 161 75 203
15 305 30 302 45 187 60 343 75 58
15 339 30 271 45 489 60 442 75 31
15 245 30 274 45 204 60 183 75 143
15 152 30 333 45 213 60 415 75 66
15 128 30 253 45 130 60 235 75 70
15 130 30 293 45 165 60 317 75 91
15 235 30 440 45 262 60 355 75 157
15 251 30 376 45 263 60 328 75 124
15 189 30 362 45 180 60 348 75 30
15 246 30 385 45 528 60 285 75 39
15 94 30 296 45 370 60 335 75 105
15 225 30 363 45 296 60 344 75 148
15 215 30 305 45 458 60 77 75 14
15 210 30 189 45 558 60 390 75 126
15 217 30 303 45 491 60 334 75 59
15 217 30 286 45 379 60 419 75 49
15 153 30 268 45 543 60 139 75 206
15 140 30 268 45 695 60 362 75 67
15 230 30 244 45 221 60 460 75 164
15 341 30 379 45 465 60 436 75 130
15 199 30 427 45 531 60 454 75 191
15 163 30 392 45 365 60 360 75 108
15 246 30 362 45 460 60 470 75 323
15 176 30 426 45 568 60 164 75 152
15 331 30 261 45 215 60 374 75 271
15 180 30 422 45 340 60 142 75 180
15 149 30 312 45 350 60 300 75 135
15 305 30 321 45 95 60 339 75 239
15 270 30 316 45 502 60 528 75 48
15 290 30 345 45 603 60 359 75 171
15 338 30 292 45 460 60 282 75 183
15 290 30 398 45 206 60 175 75 211
15 249 30 253 45 345 60 467 75 247
15 250 30 354 45 459 60 320 75 181
15 320 30 352 45 103 60 444 75 168
15 297 30 462 45 477 60 420 75 116
15 295 30 346 45 337 60 390 75 78
15 321 30 365 45 273 60 425 75 152
15 308 30 389 45 270 60 527 75 212
15 277 30 386 45 132 60 387 75 154
15 245 30 239 45 426 60 347 75 204
15 214 30 442 45 506 60 334 75 181
15 194 30 495 45 525 60 522 75 59
15 254 30 460 45 376 60 421 75 166
15 369 30 448 45 479 60 411 75 181
15 307 30 293 45 473 60 78 75 169
15 345 30 452 45 630 60 447 75 159
Distance is certainly a quantitative variable, but what about angle? Theoretically, angle would be a continuous quantitative variable, but in this situation, there were only five possible values for the angle, which might lead us to treat it as a categorical (albeit ordinal) variable. What type of graphical display would be appropriate here?
Let's begin by comparing the distances for two of these five groups: the checkers launched at 15° and those launched at 30°. One option would be to create side-by-side stem-and-leaf displays:
15° 30°
5| 5|0
4| 4|5566
4| 4|023344
3|57 3|55556667889999
3|001111223444 3|00011223
2|55555555678899 2|555677779999
2|011222334 2|44
1|555688999 1|69
1|334 1|4
0|9 0| Key: 3|5 = 350 cm
We can see from these displays that a typical distance for a checker launched at 30° is greater than one launched at 15°. We could also display this data using a back-to-back stem-and-leaf display:
15° 30°
|5|0
|4|5566
|4|023344
75|3|55556667889999
444322111100|3|00011223
99887655555555|2|555677779999
433222110|2|44
999886555|1|69
433|1|4
9|0| Key: 3|5 = 350 cm
where the leaves for the 15° distances are listed from center to left and the leaves for the 30° distances are listed from the center to right. This display makes it a bit easier to compare one group with the other, but of course a back-to-back-display only works with two groups.
When comparing all five angles simultaneously, we could create five side-by-side stem-and-leaf displays, but another option would be to use side-by-side boxplots:
Here we can clearly see the skewness and outliers for each group, although we miss the detail of the individual data values and cannot determine anything about the modes of each group. Another option is a stripchart, where we plot the individual data values in a strip for each category:
although some data values become obscured when similar values end up being plotted on top of each other. Another option is to "jitter" the dots in the stripchart slightly:
Exercises
1. Checkers Refer to the boxplots of the checker data above.
a) Which angle corresponds to the highest median distance?
b) Which angle has the biggest range?
c) Which angle has the smallest range?
d) Which angle has the biggest IQR?
e) Which angle has the smallest IQR?
f) Based on the boxplots, what appears to be the optimal angle for launching a checker?
2. [OIS 1.40] Marathon winners The histogram and boxplot below show the distribution of finishing times for winners of the New York Marathon between 1980 and 1999.
a) What features of the distribution are apparent in the histogram and not the boxplot?
b) What features are apparent in the boxplot but not in the histogram?
c) What may be the reason for the bimodal distribution? Explain.
d) The data set includes times for the fastest male and female finishers of each marathon. Compare the distribution of marathon times for men and women based on the boxplots shown
below.
e) What important feature is missing from the boxplots shown above?
f) A time series plot (shown below) is another way to look at this data. Describe what is visible in this plot but not in the others.