Combining Random Variables

Recall the Washington State Lottery's Daily Game, where you pay $1 and then choose three digits (each may be 0 through 9, with repeated digits allowed). If the random variable `X` represents your net profit when playing this game one time, a probability model for this game is:

outcome profit probability
  `x` `P(X=x)`
win $499 0.001
lose -$1 0.999

We computed an expected value of -$0.50 of this game and a variance of 249.75, so that:

`sigma = SD(X) = sqrt{249.75} approx $15.80`

Double or nothing
The Daily Game offers the opportunity to wager amounts other than $1 (there is a minimum bet of $0.50) and the prizes are adjusted accordingly. If you double the amount of your original wager (to $2), then the amount of the payout is also doubled (to $1000). Now the possible profits are $998 and -$2, which are just twice the amounts from the previous problem. Thus it makes sense to call the new random variable for the profit on one play of a $2 ticket `2X`.

We could note that the probabilities of winning and losing remain the same and create a new probability distribution for `2X`:

outcome profit probability
  `y` `P(2X=y)`
win $998 0.001
lose -$2 0.999

We could then compute the mean as before:

`mu = E(2X) = (998)(0.001) + (-2)(0.999) = -1`

but notice this is the same as:

`E(2X) = 2 times E(X) = 2(-$0.50) = -$1.00`

For the variance we could compute:

`Var(2X) = (998-(-1))^2 cdot 0.001 + (-2-(-1))^2 cdot 0.999 = 999`

and then take the square root to get:

`sigma = SD(2X) = sqrt(999) approx $31.61`

Notice that the mean (expected value) and SD have just doubled (as we saw a few weeks ago when scaling a variable), so we can generalize these results:

`E(aX) = a cdot E(X)` 

`Var(aX) = a^2 cdot Var(X)`  `SD(aX) = |a| \cdot SD(X)` 

Two tickets
Now suppose that instead of doubling the bet on a single lottery ticket, you simply buy two $1 tickets. You now have two independent events: whether the first ticket is a winner has no influence on whether or not the second ticket is also a winner (assuming you let the lottery computer choose the numbers on each ticket). By contrast, when we doubled the cost and prize money our profit was either $998 or -$2; with two tickets we could gain $998, gain $498, or lose $2.

Since we now have two random variables, let's call your profit on the first ticket `X` and your profit on the second ticket `Y`. For any single ticket, we already know the expected value and standard deviation:

`E(X) = -$0.50` and `SD(X) = $15.80`

while

`E(Y) = -$0.50` and `SD(Y) = $15.80`.

Now let's define a third random variable: `T = X + Y`, the total profit for the two tickets combined. We could construct a probability distribution for the new random variable:

outcome profit probability
  `t` `P(T=t)`
win both $998 0.000001
win one, lose one   $498 0.001998
lose both -$2 0.998001

To compute the probabilities in the preceding table we note that

`P(mbox{both win}) = P(mbox{first wins and second wins}) = P(mbox{first wins}) times P(mbox{second wins}) = 0.001 times 0.001 = 0.000001`

and

`Pmbox{both lose}) = P(mbox{first loses and second loses}) = P(mbox{first loses}) times P(mbox{second loses}) = 0.999 times 0.999 = 0.998001`

and finally deduce that

`P(mbox{win one, lose one}) = 1-[0.000001+0.998001] = 0.001998`

because the sum of all the probabilities must be 1.

We can then compute the mean:

`mu = (998)(0.000001) + (498)(0.001998) + (-2)(0.998001) = -1`

and the variance:

`Var(T) = (998-(-1))^2 (0.000001) + (498-(-1))^2 (0.001998) + (-2-(-1))^2 (0.998001) = 499.50`

and then the standard deviation:

`sigma = SD(T) = sqrt(499.5) approx $23.35`

You might notice that the expected value for the total winnings is just the sum of the individual expected values:

`mu = E(T) = E(X+Y) = E(X) + E(Y) = -$0.50 + (-$0.50) = -$1.00`

This should make sense: if you expect to lose 50 cents on each ticket, you would expect to lose $1 on two tickets.

However, the same doesn't work with the standard deviation: $23.35 ≠ $15.80 + $15.80. Yet this does work with the variance:

`sigma^2 = Var(T) = Var(X+Y) = Var(X) + Var(Y) = 249.75 + 249.75 = 499.50`

from which we can compute the standard deviation:

`sigma = SD(T) = sqrt(Var(T)) = sqrt(499.50) approx $23.35`

In general, for independent random variables X and Y: E(X+Y) = E(X) + E(Y) and Var(X+Y) = Var(X)+ Var(Y).

Remember: variances add, standard deviations don't.

Two players
Finally, suppose that you buy one ticket and your friend buys another ticket; we can now let the random variable `X` represent your profit from placing a single $1 bet and `Y` represent your friend's profit. If you and your friend have a friendly competition to see who has the greater profit, we can consider a new random variable: the difference between your profit and your friend's profit, which we can denote `D = X-Y`. As above we can create a probability distribution listing the possible outcomes:

outcome difference probability
  `d` `P(D=d)`
both win $0 0.000001
you win, she loses   $500 0.000999
you lose, she wins -$500 0.000999
both lose $0 0.998001

You should check that you understand where all of the numbers in table come from. It's not hard to compute the mean, variance and standard deviation using the definitions, as we have in previous casesbut we can also use the following formulas:

`mu = E(D) = E(X-Y) = E(X) - E(Y) = -$0.50 - (-$0.50) = $0`

`sigma^2 = Var(D) = Var(X-Y) = Var(X) + Var(Y) = 249.75 + 249.75 = 499.50`

`sigma = SD(D) = sqrt(Var(D)) = sqrt(499.50) approx $23.35`

Note that we always add variances, even when we're computing the variance of the difference of two independent random variables. It's also important to check, before we use these formulas, that we have two independent random variables.