
Combining Random Variables
Recall the Washington State Lottery's Daily Game, where you pay $1 and then choose three digits (each may be 0 through 9, with repeated digits allowed). If the random variable `X` represents your net profit when playing this game one time, a probability model for this game is:
outcome | profit | probability |
`x` | `P(X=x)` | |
win | $499 | 0.001 |
lose | -$1 | 0.999 |
We computed an expected value of -$0.50 of this game and a variance of 249.75, so that:
`sigma = SD(X) = sqrt{249.75} approx $15.80`
Double or nothing
The Daily Game offers the opportunity to wager amounts other than $1 (there is a minimum bet of $0.50) and the prizes are adjusted accordingly. If you double the amount of your original wager (to $2), then the amount of the payout is also doubled (to $1000). Now the possible profits are $998 and -$2, which are just twice the amounts from the previous problem. Thus it makes sense to call the new random variable for the profit on one play of a $2 ticket `2X`.
We could note that the probabilities of winning and losing remain the same and create a new probability distribution for `2X`:
outcome | profit | probability |
`y` | `P(2X=y)` | |
win | $998 | 0.001 |
lose | -$2 | 0.999 |
We could then compute the mean as before:
`mu = E(2X) = (998)(0.001) + (-2)(0.999) = -1`
but notice this is the same as:
`E(2X) = 2 times E(X) = 2(-$0.50) = -$1.00`
For the variance we could compute:
`Var(2X) = (998-(-1))^2 cdot 0.001 + (-2-(-1))^2 cdot 0.999 = 999`
and then take the square root to get:
`sigma = SD(2X) = sqrt(999) approx $31.61`
Notice that the mean (expected value) and SD have just doubled (as we saw a few weeks ago when scaling a variable), so we can generalize these results:
`E(aX) = a cdot E(X)`
`Var(aX) = a^2 cdot Var(X)` ⇒ `SD(aX) = |a| \cdot SD(X)`
Two tickets
Now suppose that instead of doubling the bet on a single lottery ticket, you simply buy two $1 tickets. You now have two independent events: whether the first ticket is a winner has no influence on whether or not the second ticket is also a winner (assuming you let the lottery computer choose the numbers on each ticket). By contrast, when we doubled the cost and prize money our profit was either $998 or -$2; with two tickets we could gain $998, gain $498, or lose $2.
Since we now have two random variables, let's call your profit on the first ticket `X` and your profit on the second ticket `Y`. For any single ticket, we already know the expected value and standard deviation:
`E(X) = -$0.50` and `SD(X) = $15.80`
while
`E(Y) = -$0.50` and `SD(Y) = $15.80`.
Now let's define a third random variable: `T = X + Y`, the total profit for the two tickets combined. We could construct a probability distribution for the new random variable:
outcome | profit | probability |
`t` | `P(T=t)` | |
win both | $998 | 0.000001 |
win one, lose one | $498 | 0.001998 |
lose both | -$2 | 0.998001 |
To compute the probabilities in the preceding table we note that
`P(mbox{both win}) = P(mbox{first wins and second wins}) = P(mbox{first wins}) times P(mbox{second wins}) = 0.001 times 0.001 = 0.000001`
and
`Pmbox{both lose}) = P(mbox{first loses and second loses}) = P(mbox{first loses}) times P(mbox{second loses}) = 0.999 times 0.999 = 0.998001`
and finally deduce that
`P(mbox{win one, lose one}) = 1-[0.000001+0.998001] = 0.001998`
because the sum of all the probabilities must be 1.
We can then compute the mean:
`mu = (998)(0.000001) + (498)(0.001998) + (-2)(0.998001) = -1`
and the variance:
`Var(T) = (998-(-1))^2 (0.000001) + (498-(-1))^2 (0.001998) + (-2-(-1))^2 (0.998001) = 499.50`
and then the standard deviation:
`sigma = SD(T) = sqrt(499.5) approx $23.35`
You might notice that the expected value for the total winnings is just the sum of the individual expected values:
`mu = E(T) = E(X+Y) = E(X) + E(Y) = -$0.50 + (-$0.50) = -$1.00`
This should make sense: if you expect to lose 50 cents on each ticket, you would expect to lose $1 on two tickets.
However, the same doesn't work with the standard deviation: $23.35 ≠ $15.80 + $15.80. Yet this does work with the variance:
`sigma^2 = Var(T) = Var(X+Y) = Var(X) + Var(Y) = 249.75 + 249.75 = 499.50`
from which we can compute the standard deviation:
`sigma = SD(T) = sqrt(Var(T)) = sqrt(499.50) approx $23.35`
In general, for independent random variables X and Y: E(X+Y) = E(X) + E(Y) and Var(X+Y) = Var(X)+ Var(Y).
Remember: variances add, standard deviations don't.
Two players
Finally, suppose that you buy one ticket and your friend buys another ticket; we can now let the random variable `X` represent your profit from placing a single $1 bet and `Y` represent your friend's profit. If you and your friend have a friendly competition to see who has the greater profit, we can consider a new random variable: the difference between your profit and your friend's profit, which we can denote `D = X-Y`. As above we can create a probability distribution listing the possible outcomes:
outcome | difference | probability |
`d` | `P(D=d)` | |
both win | $0 | 0.000001 |
you win, she loses | $500 | 0.000999 |
you lose, she wins | -$500 | 0.000999 |
both lose | $0 | 0.998001 |
You should check that you understand where all of the numbers in table come from. It's not hard to compute the mean, variance and standard deviation using the definitions, as we have in previous casesbut we can also use the following formulas:
`mu = E(D) = E(X-Y) = E(X) - E(Y) = -$0.50 - (-$0.50) = $0`
`sigma^2 = Var(D) = Var(X-Y) = Var(X) + Var(Y) = 249.75 + 249.75 = 499.50`
`sigma = SD(D) = sqrt(Var(D)) = sqrt(499.50) approx $23.35`
Note that we always add variances, even when we're computing the variance of the difference of two independent random variables. It's also important to check, before we use these formulas, that we have two independent random variables.