DS 1870: Module 4.2 Homework

Question 1

Blackjack is a card game where the player goes up against the dealer (not other players). A round of blackjack can end in one of three ways: “player” wins, “dealer” wins, or a “push” (tie).

The goal of blackjack is to try to get as close of a point total of 21 as possible without going over (called a bust).

If the player is closest to 21 or the dealer busts, the player wins. If the player busts or the dealer has a higher point value hand, the dealer wins. If neither player busts and have the same point total, it is a tie (push).

Blackjack steps:

Both the player and the dealer are dealt two cards. The dealer’s second card is kept face down (while the first is face up). The player’s cards are both face up.
If either the player or dealer gets exactly 21, they immediately win. If both the player and the dealer get 21, it is a push.
The player can decide to “hit” (add an additional card to their hand), or “stay”. The player then repeats that decision until they either stay or get a hand over 21.
If the player stays, then the dealer does step 3. Unlike the player, the dealer has a set of instructions of when to hit (get an additional card) or when to stay:
- If the point total is 16 or less, they must take an additional card.
- If the point total is 17 or more, they must stay (stop taking cards)
Once the dealer is done, a winner is decided.

There are some additional rules, but that’s the part you need to know for the homework.

For a quick explanation, see the video here.

Data description

Monty is running a casino and wants to know if 16 is the “best” choice to make the maximum value that the dealer must hit on. He get’s his friend Carl to run a computer simulation to play 100,000 rounds of blackjack each, using different hit values: 14, 15, 16, 17, 18. The hit values are the point totals that the dealer must choose to hit (take another card) and if it is more than the hit value, the dealer must stay.

Monty will consider the hit value to be the best choice if it still gives the dealer an advantage (higher win percentage), but as close to 50% as possible. (Why a casino would want the dealer advantage to be as close to 50% as possible is because players will choose to go to another casino if they have a better chance at winning).

The data set read in below is the results of Carl’s simulation. There are 100,000 rows and 5 columns. Each column corresponds to the simulations ran for that hit value.

The column names are hits_14, hits_15, hits_16, hits_17, hits_18. Each row shows us the result of the round using the hit total of that column name.

For example, the first show shows that when the dealer had to hit on 14 (hits_14) and must stay on 15, the dealer one the first round (” dealer”).

The value in the second column of row one shows the player won the first round (“player”) when the dealer must hit on 15 (hits_15) and must stay on 16.

# Reading in the data for question 1
blackjack <- read.csv("blackjack simulation.csv")

Question 1a) Summarized Blackjack data

Create the table seen in Brightspace using the blackjack data. Save it as blackjack_1a

blackjack_1a <- 
  blackjack |> 
  # Putting all 5 columns (result) into one column
  pivot_longer(cols = hits_14:hits_18,
               names_to = "must_hit",
               values_to = "result") |> 
  # Counting up the way the 100,000 ways the games ended for each hit amount
  count(must_hit, result, name = "count") |> 
  # Removing "hits" from must_hit column and calculating the result percentage
  mutate(
    .by = must_hit,
    must_hit = parse_number(must_hit),
    result_prop = count/sum(count)
  )

blackjack_1a

## # A tibble: 15 × 4
##    must_hit result count result_prop
##       <dbl> <chr>  <int>       <dbl>
##  1       14 dealer 45862      0.459 
##  2       14 player 44138      0.441 
##  3       14 push   10000      0.1   
##  4       15 dealer 46865      0.469 
##  5       15 player 43197      0.432 
##  6       15 push    9938      0.0994
##  7       16 dealer 49389      0.494 
##  8       16 player 41193      0.412 
##  9       16 push    9418      0.0942
## 10       17 dealer 52744      0.527 
## 11       17 player 38499      0.385 
## 12       17 push    8757      0.0876
## 13       18 dealer 58119      0.581 
## 14       18 player 34371      0.344 
## 15       18 push    7510      0.0751

Part 1b) Dumbbell plot for win percentage

Create a dumbbell plot comparing the win percentage of the dealer vs player. Add a vertical, dashed line at 50%. See the graph in Brightspace

blackjack_1a |> 
  filter(result != "push") |> 
  ggplot(
    mapping = aes(
      x = result_prop,
      y = factor(must_hit),
      color = result
    )
  ) + 
  
  geom_line(
    color = "black",
    linewidth = 1
  ) + 
  
  geom_point(
    size = 3
  ) + 
  
  geom_vline(
    xintercept = 0.5,
    linetype = "dashed"
  ) +
   
  labs(
    x = NULL,
    y = "Highest Value that the Dealer Must Choose to Hit",
    color = "Winner",
    title = "Blackjack Simulation Results: What's the 'Best' Choice for the Dealer to 'Hit'"
  ) +
  
  theme_bw() + 
  
  # Add these to the end of your graph
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = c(0.925, 0.125)
  ) + 
  
  scale_x_continuous(
    labels = scales::label_percent()
  )

Question 1c) What is the best choice to hit on?

Using the graph created in the previous question, what is the best choice to make the dealer take another card for fit in Monty’s “best” choice? Briefly explain your answer

Question 2) Rolling with an advantage

In the game “Dungeons and Dragons”, players are often asked to roll a twenty-sided die (called a d20) to decide the outcome decisions: opening a locked chest, intimidating an NPC, resisting being mind-controlled, or seeing if an attack lands on the target.

Players can sometimes has bonus when needing to roll a d20, called an advantage. If a player has an advantage, they can roll two d20s and keep the higher result.

For example, if Sam has advantage, if she rolls two d20 and get a 7 and 17, only the 17 counts and the 7 is ignored.

An additional benefit can occur when getting exactly a 20 on the d20 (referred as a “natural 20” or “nat 20”).

Dave wants to know how much of a benefit having an advantage when rolling a d20 is worth when trying to get a natural 20.

His friend Dan has a lot of free time, so he rolls a single d20 until he gets a d20 and counts how many attempts it took. He then repeated this 100,000 times.

Dan told Dave what he was doing, and Dave decided to do the same thing. However, Dave rolled with advantage (rolling two d20 and keeping the highest result) until at least 1 die comes up as a 20 and also repeated this 100,000 times total (What can they say, they both have a lot of down time since their last job ended).

They saved their results in the “d20 advantage simulation.csv” file, read into R in the code chunk below.

d20_results <- read.csv("d20 advantage simulation.csv")

The one_d20 column has how many times it took Dan to get a natural twenty rolling only one d20, while the two_d20 column has the results of Dave’s rolls. If it took more than 50 rolls to get a natural 20, they saved the results as “>50”.

Part 2A) Summarizing Dan’s and Dave’s results separately

Create the two separate data frames summarizing the results that can be found in Brightspace. Save them as dans_results and daves_results, respectively.

If you want to rename a column of a data frame, you can use rename(new_name = old_name) and can be part of a pipe chain!

At the end of the pipe chain, pipe your results into arrange(parse_number(n_rolls)) to arrange them in numeric order (1, 2, 3, …) and not alphanumeric order (1, 10, 11, …).

Dan’s results

# Create dans_results below:
dans_results <- 
  d20_results |> 
  count(one_d20, name = "attempts") |> 
  rename(n_rolls = one_d20) |> 
  mutate(
    prob = attempts/sum(attempts)
  )|> 
  arrange(parse_number(n_rolls))

# Use this to show the results in the knitted document
tibble(dans_results)

## # A tibble: 51 × 3
##    n_rolls attempts   prob
##    <chr>      <int>  <dbl>
##  1 1           5003 0.0500
##  2 2           4668 0.0467
##  3 3           4417 0.0442
##  4 4           4326 0.0433
##  5 5           3957 0.0396
##  6 6           3833 0.0383
##  7 7           3736 0.0374
##  8 8           3526 0.0353
##  9 9           3359 0.0336
## 10 10          3200 0.032 
## # ℹ 41 more rows

Dave’s results

# Create dans_results below:
daves_results <- 
  d20_results |> 
  count(two_d20, name = "attempts") |> 
  rename(n_rolls = two_d20) |> 
  mutate(
    prob = attempts/sum(attempts)
  ) |> 
  arrange(parse_number(n_rolls))

# Use this to show the results in the knitted document
tibble(daves_results)

## # A tibble: 51 × 3
##    n_rolls attempts   prob
##    <chr>      <int>  <dbl>
##  1 1           9740 0.0974
##  2 2           8947 0.0895
##  3 3           7870 0.0787
##  4 4           7134 0.0713
##  5 5           6423 0.0642
##  6 6           5774 0.0577
##  7 7           5405 0.0540
##  8 8           4654 0.0465
##  9 9           4295 0.0430
## 10 10          3998 0.0400
## # ℹ 41 more rows

Part 2b) Merging the two data sets together

Create a new data frame called d20_summary that you can find in Brightspace that combines Dan’s and Dave’s results into one data frame, and creates the cumulative probabilities for each person to the data frame.

You’ll need to use the cumsum() function, which stands for “cumulative summation”. It will add all the values of that row and before it together. For example, cumsum(c(2, 5, 9)) will give you c(2, 2 + 5, 2 + 5 + 9)

d20_summary <- 
  left_join(
    x = dans_results,
    y = daves_results,
    by = "n_rolls",
    suffix = c("_dan", "_dave")
  ) |> 
  mutate(
    cumul_prob_dan  = cumsum(prob_dan),
    cumul_prob_dave = cumsum(prob_dave)
  ) |> 
  dplyr::select(-attempts_dan, -attempts_dave)

tibble(d20_summary)

## # A tibble: 51 × 5
##    n_rolls prob_dan prob_dave cumul_prob_dan cumul_prob_dave
##    <chr>      <dbl>     <dbl>          <dbl>           <dbl>
##  1 1         0.0500    0.0974         0.0500          0.0974
##  2 2         0.0467    0.0895         0.0967          0.187 
##  3 3         0.0442    0.0787         0.141           0.266 
##  4 4         0.0433    0.0713         0.184           0.337 
##  5 5         0.0396    0.0642         0.224           0.401 
##  6 6         0.0383    0.0577         0.262           0.459 
##  7 7         0.0374    0.0540         0.299           0.513 
##  8 8         0.0353    0.0465         0.335           0.559 
##  9 9         0.0336    0.0430         0.368           0.602 
## 10 10        0.032     0.0400         0.400           0.642 
## # ℹ 41 more rows

Part 2c) Line graph for cumulative probabilities

Using the data created in 2b, create the line graph seen in Brightspace. You’ll need to use parse_number() to map n_rolls to the x-axis.

d20_summary |> 
  pivot_longer(
    cols = c(cumul_prob_dan, cumul_prob_dave),
    names_to = "rolling_sit",
    values_to = "cumul_prob"
  ) |> 
  ggplot(
    mapping = aes(
      x = parse_number(n_rolls),
      y = cumul_prob,
      color = rolling_sit
    )
  ) +
  
  geom_line() +
  
  labs(
    x = "Number of Rolls until a Natural Twenty",
    y = "Cumulative Probability",
    color = "Does the Player Have an Advantage?"
  ) +
  theme_bw() +
  
  # Add these to the end to make it look like what is in Brightspace
  scale_x_continuous(
    breaks = c(0, 10, 20, 30, 40, 50),
    labels = c(0, 10, 20, 30, 40, "50+")
  ) +
  
  scale_color_discrete(
    labels = c(cumul_prob_dan = "No",
               cumul_prob_dave = "Yes")
  ) +

  theme(
    legend.position = "top"
  )