
Randomness
We have learned that random selection plays a crucial role in selecting samples for observational studies, and random assignment plays a crucial role in forming treatment groups for experiments. But just how do we randomly select individuals from the population of interest to include in a sample, or randomly assign study participants to a treatment group? One way would be to put each name on a slip of paper, place it in a box, shake up the box, and draw names. But this is highly impractical, especially in situations where the population of interest might include hundreds of millions of people.
Quizzes
Consider the online quizzes here on WAMAP. If you have attempted these quizzes more than once, you may have realized that are different (but similar) questions that are served randomly to each student. In some cases, there are infinitely many possible problems; in other cases, only a handful. Let's consider a quiz problem with five possible questions that is set up to randomly select one of these five questions whenever a student takes the quiz.
But what does "random" really mean? As you might expect, in this situation it could mean that any time a student attempted the quiz, that student would have an equal likelihood of getting any one of the five possible matching questions.
If instead the quiz was being given on pencil and paper in a lecture class, I might roll a standard six-sided die and give each student the question (1, 2, 3, 4 or 5) corresponding to the number showing on the top face of the die. Of course, this would work a lot better if there were six possible questions, but I could always state in advance that if I rolled a 6 I would roll again. As long as the die wasn't "loaded" we might expect that each face of the die had an equal chance of showing, hence each student would have an equal chance of getting any one of the five quiz questions.
Now suppose I forgot to bring a six-sided die to class. Fortunately I never forget my calculator, so we can use that instead. Press the MATH button:
Then move the cursor over so that PRB (for probability, since that's where were headed with all this, eventually) is highlighted, then move down to randInt(:
Now press ENTER. Next type 1,5) [a 1 then a comma then a 5 and then a right parenthesis] so that you see randInt(1,5) on the screen and press ENTER. The TI-84 should give you a random integer from 1 to 5:
Except that for most of you it probably gave you a 5. (This works a lot better in a lecture class when I call out the numbers from 1 to 5 and hardly anyone raises their hand until we get to 5; trust me, it makes an impression.) So, what's up? Well, the TI-84 doesn't give us truly random numbers, rather it gives us pseudorandom numbers.
To get around this do the following. Make up a 4- or 5-digit number. This is your seed. Mine will be 12345. (Don't choose this one!) Now type this number into the calculator, then press STO> (this is the "store" button, above the ON key):
then MATH, move over to PRB, and press ENTER. You should see 12345→rand (or your seed, an arrow and rand) on the screen.
Now press ENTER again:
This number has been stored as the random seed on your TI-84. Now you won't get the same answers as all of your classmates the next time we use the randInt feature.
Getting back to our example, if we wanted to assign one of five quiz problems to all 30 students enrolled in the class, we could use randInt(1,5) 30 times in a row. This isn't quite as bad as it seems, because after you enter it the first time you can keep pressing ENTER 29 more times until you have 30 numbers:
Or you could type randInt(1,5,30) to get 30 random numbers from 1 through 5, all at once:
But as you can see, you have to scroll to the right repeatedly to see all 30 numbers. So you might want to use randInt(1,5,6) and press ENTER 4 more times so that you get 6×5 = 30 numbers:
Remember, though, that the TI-84 doesn't give us truly random numbers. For most of the exercises we work in class that won't really be a problem, but it is for some people, for whom it is vital that the numbers they use be truly random. Where can we get truly random numbers? Where we get everything these days, the Internet!
True random numbers
Go to random.org and you will see a link that says Integer Generator:
Click on this link. We want to generate 30 random integers from 1 to 5 and we can format them in 6 columns:
Click Get Numbers and you should see something like this:
although your numbers shouldn't all be the same as mine!
Simulations
If, in a class of 30 students, no one received question #4 from WAMAP, should we be concerned about WAMAP's "random" quiz generator? We would need to know how likely it would be for the number 4 not to show up in this simulation that we just ran. So I could run a simulation on random.org another 199 times, say, and count the number of times 4 didn't show up.
But the real question I want to ask is, how likely is it that any one of the five numbers doesn't show up out of a group of 30. (After all, I would have been just as concerned if it had been #3 or #5 that was never assigned.) So instead of looking for just #4, I would look at these 200 simulations and count the number of times that at least one of the five numbers was missing from the group of 30 digits.
In fact, I did just that (with the help of a spreadsheet and some time-saving Excel formulas to test if one or more of the 5 digits was missing). Exactly 1 time out of 200 one of the digits was missing. So I will estimate that one of the quiz questions would not be assigned about 0.5% of the time. (In fact, the exact answer, which we'll learn how to compute later, is about 0.62%.) Might this raise my suspicions about WAMAP? Yes, since it seems that something like this should be a relatively rare occurrence, but at the same time I wouldn't conclude with certainty that there was a problem (since I would expect it to happen about 1 out of every 200 times I assigned the quiz problem to a class of 30 students).
Above, and in the exercises that follow, we use random numbers to simulate a complicated real-life event. Soon we will learn more rigorous mathematical techniques to compute the likelihood of various outcomes for such situations—at which time we'll use the term "probability." (If you've already learned how to do some simple probability computations, pretend that you haven't for the moment and instead use the random simulation techniques to work the exercises.)
Exercises
1. World Series From 1997 through 2011, American League (AL) teams beat National League (NL) teams in about 52% of all Major League Baseball interleague games (games where an AL team played an NL team). In the World Series, where the AL champion plays the NL champion, the first team to win four games (out of a possible seven) wins. Assuming that the AL champion wins 52% of all games against the NL champion (this is a mighty big assumption, and not necessarily reasonable), how likely is it that the AL team wins the World Series? Conduct a simulation using random numbers to simulate at least 20 World Series and compute the percentage in which the AL teams wins.
2. Undecided voters Three recent polls found, respectively, that 7%, 8% or 9% of the likely voters in Washington state remain undecided about whether they will suppose Jay Inslee or Rob McKenna in the race for governor. Assume that 8% of Washington voters are undecided about the governor's race. Conduct a simulation to estimate the number of voters you would need to contact if you randomly called likely voters in Washington state until you found two undecided voters to interview for a news story about the election.