Binomial Probability Models

We previously considered the geometric model, which involved Bernoulli trials, and used it to compute probabilities related to the number of EdCC students we would need to randomly select until we found a female student, assuming that 55% of all EdCC were female. Now let's consider a different, but related, question involving Bernoulli trials. Suppose we randomly select five EdCC students. What is the probability that none of them are female?

This situation also involves Bernoulli trials, bu now there are a fixed number of trials (in this case, 5). Let `Y` be a random variable representing the number of females in our five-person sample. We then have:

`P(Y=0) = (0.45)^5 approx 0.0185`

What is the probability that exactly one student is female? We could compute the probability that only the first student is female:

`(0.55)(0.45)(0.45)(0.45)(0.45) = (0.55)(0.45)^4 approx 0.0226`

but the one female in our group of five might be the second student instead:

`(0.45)(0.55)(0.45)(0.45)(0.45) = (0.55)(0.45)^4 approx 0.0226`

or the third or the fourth or the fifth. These five probabilities are all the same, so the probability that exactly one of the five students is female is given by:

`P(Y=1) = 5 times (0.55)(0.45)^4 approx 0.1128`

After this point it gets more complicated. For the probability that exactly two of the five students are female we need to determine the number of ways we can select two of the five students to be female; if we call the students A, B, C, D and E, then here are the different two-person groups who could be female:

AB, AC, AD, AE, BC, BD, BE, CD, CE, DE

In other words, there are 10 ways to select two people out of a five-person group. So:

`P(Y = 2) = 10 times (0.55)^2(0.45)^3 approx 0.2757`

Selecting three people out of a group of five is just like selecting two people to leave behind, so there are also 10 ways to do that, and we have:

`P(Y = 3) = 10 times (0.55)^3 (0.45)^2 approx 0.3369`

Selecting four people is like leaving one behind, so there are five ways to do that:

`P(Y=4) = 5 times (0.55)^4(0.45) approx 0.2059`

and it's relatively straightforward to compute the probability that all five students are female:

`P(Y = 5) = (0.55)^5 approx 0.0503`

In general, a formula for this probability distribution, which we call a binomial model (Bernoulli trials where we count the number of successes in a fixed number of trials), would look like this:

`P(Y = k) = quad _n C_k p^k q^{n-k}`

where represents `_n C_k`  the number of ways of choosing a group of k people or things out of a group of n people or things. It's not difficult to develop a formula for this  number (called a combination) and in fact these numbers show up in something called Pascal's triangle:

        1
      1  1
    1   2  1
   1  3  3  1
  1  4  6  4  1
 1  5 10 10  5  1
1  6 15 20 15  6  1  

But when n gets very large, it can be cumbersome to compute these numbers with the triangle or a formula, so we usually turn to a calculator. And in that case, we might as well turn to the calculator to compute the binomial probabilities directly.

Under the DISTR menu on the TI-84 (not too far above where you found geometpdf) you'll find bimopdf. You can use this to compute:

`P(Y=2)` = binompdf(5,0.55,2) ≈ 0.2757

In general, we enter binompdf(n,p,k) where n is the (fixed) number of trials, p is the (fixed) probability of success and k is the number of successes in question. 

We can check that our other previous answers also agree with the calculator:

`P(Y=0)` = binompdf(5,0.55,0) ≈ 0.0226

`P(Y=1)` = binompdf(5,0.55,1) ≈ 0.1128

`P(Y=3)` = binompdf(5,0.55,3) ≈ 0.3369

`P(Y=4)` = binompdf(5,0.55,4) ≈ 0.2059

`P(Y=5)` = binompdf(5,0.55,5) ≈ 0.0503

You should check that the sum of these probabilities is 1. There are fairly simple formulas for the mean and standard deviation of a binomial distribution:

`mu = E(Y) = np` and `sigma = SD(Y) = sqrt(npq)`

The details of why these formulas can be found below the exercises. In our EdCC student example:

`mu = E(Y) = np = 5(0.55) = 2.75`

so, if we repeatedly selected 5 EdCC students at random and counted the number of female students in each 5-student group, we would expect to find an average of about 2.75 females per group, with a standard deviation of:

`sigma = sqrt(npq) = sqrt(5*0.55*0.45) approx 1.11`.

If we want to know the probability that we find no more than two females in our five-student sample, we can add the appropriate individual probabilities found above, or we can use the binomcdf feature on the TI-84:

`P(Y leq 2)` = binomcdf(5,0.55,2) ≈ 0.4069

Note that we are using the binomcdf feature here (where again c stands for "cumulative").

If we want to know the probability that we find at least two females in a five-student sample, we could compute:

`P(Y ge 2) =` 1−binomcdf(5,0.55,1) ≈ 0.8646

Mean and standard deviation
The expected value (mean) of a binomial distribution is given by the formula:

`E(Y) = np`

where n is the number of trials and p is the probability of success. In our EdCC student example, the expected number of females in a group of 5 is:

np = 5(0.55) = 2.75

The standard deviation is given by the formula:

`SD(Y) = sqrt(npq)`

where q is the probability of failure, so the SD in our EdCC student example would be:

`SD(Y) = sqrt(npq) = sqrt(5(0.55)(0.45)) approx 1.1` females

Exercises

1. According to the 2010 U.S. Census, 13.6% of Snohomish County residents were born in a foreign country.

a) If you randomly select 10 Snohomish County residents, compute the probability that:

i) none of them were foreign born.

ii) exactly 3 of them were foreign born.

iii) 2 or 3 of them were foreign born.

iv) at least one of them was foreign born.

v) at most 3 of them were foreign born.

vi) at least 3 of them were foreign born.

b) If you were to repeatedly select random samples of 10 Snohomish County residents,

i) how many, on average, would you expect to be foreign born?

ii) what would you expect the standard deviation of the number of foreign-born residents to be?

c) What probability model did you use to answer the preceding questions?

d) Explain why that probability model was appropriate.

2. In the 2008 U.S. presidential election, Sen. John McCain received 40.7% of the votes cast for president in the state of Washington. A political consultant wants to interview some of these voters in order to ask them if they plan to support Mitt Romney in the 2012 election. She obtains a list of voters who participated in the 2008 election and begins calling them at random.

a) What is the probability that none of the first three people she calls voted for McCain?

b) What is the probability that at least one of the first three people she calls voted for McCain?

c) What is the probability that the first McCain voter she finds is the third person she calls?

d) On average, how many people should she expect to call before finding a McCain voter?

e) What is the probability that exactly three of the first 12 people she calls voted for McCain?

f) What is the probability that at most three of the first 12 people she calls voted for McCain?

g) What is the probability that at least three of the first 12 people she calls voted for McCain?

h) On average, how many of the first 12 people she calls would you expect to be McCain voters?

i) With what standard deviation?