Visualizing expectations and variation through simulation

Expected values and variance

I am not a fan of gambling but I find it interesting to think that life itself is a gamble. To avoid the inevitability of daily random events seems futile: traffic volume and weather are but the two more prominent in our daily lives.

Oftentimes forecasts and averages are rendered useless by the seemingly wild variations encountered in spatial and temporal oddities inherent to natural phenomena. After all, statistics, in general, is well known for failing to predict the unusual events.

Wagers

As a means to experiencing the impact of expected values and variability let’s follow a slightly modified example of wagers from (Rodriguez and Mendes 2018, 19).

For wager 3: you pay $300 to enter and I roll a die. If it comes up 1, 2, or 3, then I keep your money. If it comes up 4, 5, or 6, then I give you back $600 (your original bet plus a $300 profit).

For wager 4: you pay $300 to enter and I roll a die. If it comes up 1 or 2 then I get to keep your money. If it comes up 3, 4, 5, or 6 then I give you back $450 (your original $300 plus a profit of $150).

Comparing the wagers

] Assuming a fair die, wager 3 feels balanced at 50% chance of winning or losing the same amount. Wager 4 feels somewhat unbalanced because one can make 50% profit 2/3 of the time.

We need to compare the two wagers to assess their relative benefit. The expected profit from each wager in statistical terms is a
straightforward way of doing so.

Expected Profit from the wagers

The profit of a wager is defined as the payout minus the cost of entry. The expected profit of each wager is:

\[\begin{align} E(X) &= [x_1 P(X=x_1) + .. + x_n P(X=x_n)] - C \tag{1}\\ &= [(x_1 -C) P(X=x_1) + .. + (x_n -C) P(X=x_n)]\tag{2} \end{align}\] \[\begin{align} E(W_3) &= [(0) (1/2) + 600 (1/2)] - 300 \\ &= [(-300) (1/2) + (300) (1/2)] \\ &= 0 \end{align}\] \[\begin{align} E(W_4) &= [(0) (1/3) + 450 (2/3)] - 300 \\ &= [(-300) (1/3) + 150 (2/3)] \\ &= 0 \end{align}\]

Eq. [2] is simpler to compute the variance, and thus it is used in Eq. [3] below.

Note: I modified the values of the original wagers from the reference in an effort to make the dollar amounts more realistic. This apparently takes away from what is considered a fair bet, however, we will defer that subject for another day and definitively recommend that the curious reader goes to the source of the wagers for this discussion (Rodriguez and Mendes 2018, chap. 2)

Let’s use R to compute those values.

cost_of_entry = 300
probability_of_die_eq_or_less_than_three = 0.5
payout_of_die_eq_or_less_than_three = 0
payout_of_die_greater_than_three = 300 + 300
expected_value_wager_3 <- (payout_of_die_eq_or_less_than_three *
                          probability_of_die_eq_or_less_than_three +
                          payout_of_die_greater_than_three * (1 - probability_of_die_eq_or_less_than_three)) -
  cost_of_entry
cat(paste(
  paste("Expected net profit from wager 3:",
        ifelse(all.equal(expected_value_wager_3, 0.0),
               0.0, 
               expected_value_wager_3),
        sep = "\n"),
  " dollars"))

## Expected net profit from wager 3:
## 0  dollars

In a similar fashion the expected statistical profit for wager 4 is:

cost_of_entry = 300
probability_of_die_eq_or_less_than_two = 1 / 3
payout_of_die_eq_or_less_than_two = 0
payout_of_die_greater_than_two = 300 + 150
expected_value_wager_4 <- (payout_of_die_eq_or_less_than_two * probability_of_die_eq_or_less_than_two +
                             payout_of_die_greater_than_two * (1 - probability_of_die_eq_or_less_than_two)) -
                             cost_of_entry
cat(paste(
  paste("Expected net profit from wager 4:",
        ifelse(all.equal(expected_value_wager_4, 0),
               0, 
               expected_value_wager_3),
        sep = "\n"),
  " dollars"))

## Expected net profit from wager 4:
## 0  dollars

All that effort to confirm that neither wager will make or lose you money in the long run. So, where does that leave us in terms of the relative merits or drawbacks of each bet? We need other tools to analyze the outcomes from engaging in either wager.

The first thing we will look at is a verification that the expected value does manifest in reality, albeit after a few repetitions, after all practice makes perfect, right?

Simulation

In the following animated gif image you will see 30 sets of 2,000 bets using wager 3 in blue and the same number of bets using wager 4 in red. Each of the 30 sets starts with a different random number seed before sampling the die with replacement 2,000 times. Because R is a vectorized language, this is done in a single line:

die_results = sample(x =1:6, size = 2000, replace = TRUE)

The running profit is obtained by dividing the sum of the net profit by the number of times played. That explains why if you always win the bet the maximum running profit is the maximum payout, $300, in the case of wager 3 and $150 for wager 4. However, if you always lose, both wagers would produce a $300 loss.

Please remember that this does not make wager 3 any more desirable by itself because both converge to the same expected value in the long term, zero profit.

Variance

So, how else can we qualify the different wagers? Are they different in any practical way?

The statistical variance might give us a clue, it is calculated as follows:

\[\begin{align} V(x) &= E{[X - E(X)]^2} \tag{3}\\ &= E{X^2 - 2 X E(X) + {E(X)}^2}\\ &= E(X^2) - 2 E(X) E(X) + {E(X)}^2\\ &= E(X^2) - {E(X)}^2\\ &= (x_1)^2 P(X = x_1) + .. + (x_n)^2 P(X = x_n) - {x_1 P(X = x_1) + .. + x_n P(X= x_n)}^2\tag{4} \end{align}\]

Substituting for each wager:

\[\begin{align} V(W_3) &= [(-300)^2 x (1/2) + (300)^2 x (1/2)] - {(-300) x (1/2) + 300 x (1/2)}^2\\ &= 300^2 - {0} \\ &= 90,000 \end{align}\] \[\begin{align} V(W_4) &= [(-300)^2 x (1/3) + (150)^2 x (2/3)] - {(-300) x (1/3) + (150) x (2/3)}^2\\ &= 30,000 + 50 x 150 x 2 - {0}\\ &= 30,000 + 15,000 \\ &= 45,000 \end{align}\]

This means that the variability of the profit with wager 3 can be substantially larger that with wager 4, although both have zero expected profit in the long run. What does that say about the gambling budget necessary to sustain these wagers?

Experiencing variability via simulation

I believe a realistic way of thinking about what the variability of the expected outcomes of these two wagers is thinking in terms of the cash flow necessary to sustain 2,000 bets, the same number we used for the previous simulations of the expected profit.

The following animated image shows 60 such sets of 2,000 bets one for each wager. The idea is to glance at all of them to get a sense for the wager that gives the most extreme variations in the positive or negative cash flow necessary to sustain the 2,000 bets.

Overall, it seems the red histograms span larger areas than the blue ones. This reinforces the results from the calculation of variance showing that wager 3 has more variability than wager 4.

Note that wager 3 may require larger amounts of cash to sustain the 2,000 bets, when the cash flow is negative. At this point is worth reminding the reader that both wagers have a neutral expected profit.

References

Rodriguez, Abel, and Bruno Mendes. 2018. Probability, Decisions and Games: A Gentle Introduction Using r.