Conditional Probability

Recall the class survey data we previously analyzed to investigate an association between gender and beverage preference:

  female male total
Coke 7 9 16
Pepsi 10 4 14
neither 8 3 11
total 25 16 41

To better compare males to females computed percentages instead of counts. Here are column percentages for this contingency table:

  female male
Coke 28% 56%
Pepsi 40% 25%
neither 32% 19%
total 100% 100%

Can we interpret these percentages as probabilities? The 28% at upper left represents the percentage of females in the class who prefer Coke. So the probability that we randomly select one of the female students and find a student who prefers Coke is 28%. Another way of saying this is: What's the probability that a student prefers Coke, given that she is female. This last phrase is a condition we specify before computing the probability. So we call this a conditional probability and write

P(beverage = Coke | gender = female) = 0.28

where the | symbol means "given that" in this situation.

We could also compute that P(gender = female | beverage = Coke) = 7/16 ≈ 0.44 because 7 of the 16 Coke drinkers are female. Notice that P(Coke|female) ≠ P(female|Coke). Swapping the condition with the outcome of interest yields a completely different question and hence a completely different answer.

Recall that we to display the information in the beverage vs. gender contingency table, we created a mosaic plot.

completed mosaic plot

The heights of the rectangles in the female column correspond to the conditional probabilities P(Coke|female), P(Pepsi|female) and P(neither|female). Likewise, the heights of the rectangles correspond to P(Coke|male), P(Pepsi|male) and P(neither|male).

If beverage were independent of gender, we would expect that these heights would be roughly the same in each column, because we would expect that female students would have roughly the same beverage preferences as male students. In other words, we would expect that P(Coke|female) ≈ P(Coke|male), P(Pepsi|female) ≈ P(Pepsi|male) and P(neither|female) ≈ P(neither|male). But be cause these equalities do not hold [in particular, P(Coke|female) ≈ 0.28 ≠ 0.56 = P(Coke|male)], we conclude that beverage and gender are not independent. (Remember that we say "not independent" rather than saying "dependent" in such a circumstance, because we don't know that one variable in fact depends on the other.)

Exercises

1. [OIS 1.48] Views on immigration A SurveyUSA poll conducted January 27–29, 2012, interviewed 910 registered voters from Tampa, Florida, asking each respondent if they thought workers who have illegally entered the US should be (i) allowed to keep their jobs and apply for US citizenship, (ii) allowed to keep their jobs as temporary guest workers but not allowed to apply for US citizenship, or (iii) lose their jobs and be required to leave the country. The survey also asked each respondent to characterize their political ideology (conservative, moderate, liberal). The results of this survey appear in the table below:

                                                   political ideology
                                           Conservative Moderate Liberal
                 (i) Apply for citizenship           57      120     101
   immigration  (ii) Guest worker                   121      113      28
      response  (iii) Leave the country              179      126      45
                (iv) Not sure                        15        4       1

If we randomly select one of these voters, compute the probability that he or she:

a) self-identified as a conservative.

b) favored the citizenship option.

c) self-identified as a conservative and favored the citizenship option.

d) self-identified as a conservative or favored the citizenship option.

e) self-identified as a conservative, given that they favored the citizenship option.

f)favored the citizenship option, given that they self-identified as a conservative.

g) favored the citizenship option, given that they self-identified as a moderate.

h) favored the citizenship option, given that they self-identified as a liberal.

i) Do political ideology and views on immigration appear to be independent? Explain, using the conditional probabilities you computed above.