Each lab in this course will have multiple components. First, there will a piece like the document below, which includes instructions, tutorials, and problems to be addressed in your write-up. Any part of the document below marked with a star, \(\star\), is a problem for your write-up. Second, there will be an R script where you will do all of the computations required for the lab. And third, you will complete a write-up to be turned in the Friday that you did your lab.
You will use RMarkdown to type your lab write-up. If you’re new to R and/or LaTeX, I’m also happy to work with you to learn RMarkdown! All of your computational work will be done in RStudio Cloud, and both your lab write-up and your R script will be considered when grading your work.
The ready availability of computing power has fundamentally changed how we learn and interact with ideas in probability. In particular, we have the ability to quickly simulate data as a way to investigate ideas in probability.
In this lab, you will get an introduction to simulation in R using
the sample function.
As we’ve discussed in class, when we roll a die, we think of it as a random process where each outcome is equally likely, i.e., each number on the die has an equal chance of being rolled.
The sample function in R allows you to randomly sample
from a collection of options under the assumption above. For the case of
a dice roll, this means randomly choosing an integer between 1 and
6.
Vectors are a convenient way to store these possible outcomes, and we can easily create this integer vector as shown below.
1:6
## [1] 1 2 3 4 5 6
If we want to randomly select one of these values, simulating a single roll of a die, we write
sample(1:6, size = 1)
## [1] 5
where the size argument tells us how many times we want to roll the die. We could increase the size to simulate rolling more than once, and the output is a vector of two samples.
sample(1:6, size = 3)
## [1] 3 5 4
What if we want to roll 10 times?
sample(1:6, size = 10)
## Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
One of the arguments of the sample function is
replace, which allows you to sample from the set of
outcomes with or without replacement. If we’re thinking about rolling a
die, an appropriate simulation would be to roll with replacement, since
it’s possible to roll the same value more than once. Note that the
default is to sample without replacement, which is why we had the error
in the previous code.
sample(1:6, size = 3, replace = TRUE)
## [1] 1 1 4
sample(1:6, size = 10, replace = TRUE)
## [1] 5 5 4 1 1 2 3 3 3 5
One thing that’s useful is instead of writing 1:6 over
and over again is to store this vector as a variable. This way, if you
want to switch to rolling a 4-sided die, 20-sided die, etc., you can do
so by only changing one line of your code rather than every line that
refers to the outcomes on the die.
die <- 1:6
sample(die, size = 10, replace = TRUE)
## [1] 6 1 1 1 2 6 1 3 1 4
If we’re flipping coins, we notice that we don’t have to sample from a vector of integers, we can sample from any vector. Here we’ll first define a vector of outcomes for a coin, and then flip with replacement 10 times.
coin <- c('H', 'T')
sample(coin, size = 10, replace = TRUE)
## [1] "H" "H" "T" "T" "H" "H" "T" "T" "T" "T"
The main benefit of simulation with R is the ability to quickly generate very large samples. Here we simulate one million rolls of a die. Note that we save it as a variable, because we really don’t want to have it print out one million values. We can then use the table function to show how times each number was rolled, or divide by the sample size to find the proportion of each value.
n <- 1000000
simulated_rolls <- sample(die, size = n, replace = TRUE)
table(simulated_rolls)
## simulated_rolls
## 1 2 3 4 5 6
## 167039 166494 166401 167388 166024 166654
table(simulated_rolls) / n
## simulated_rolls
## 1 2 3 4 5 6
## 0.167039 0.166494 0.166401 0.167388 0.166024 0.166654
Likewise, we could do the same with coin flips.
simulated_flips <- sample(coin, size = n, replace = TRUE)
table(simulated_flips)
## simulated_flips
## H T
## 500054 499946
table(simulated_flips) / n
## simulated_flips
## H T
## 0.500054 0.499946
Finally, we can quickly visualize the simulated results using histograms or bar charts.
hist(simulated_rolls)
We worked together in class to discuss the problem of points when we replaced the coin with a die. Theoretically, the game could possibly fail to end. To what extent does this happen in a simulated game?
Consider the following block of code. It won’t run here, but you can run it in your lab R script to see what happens.
win <- 3
player_one <- 4
player_two <- 1
n <- 10
results <- rep(0, times = n)
for (i in 1:n) {
iter <- 0
p1_count <- 0
p2_count <- 0
while ((p1_count < win) & (p2_count < win)) {
roll <- sample(1:6, size = 1, replace = TRUE)
if (roll == player_one) {
p1_count <- p1_count + 1
}
else if (roll == player_two) {
p2_count <- p2_count + 1
}
iter <- iter + 1
}
results[i] <- iter
}
(\(\star\)) In your R Markdown lab write up document, you will find the same code block as above, but with empty spaces to add documentation (type your documentation after the hash tags). Write your own documentation to explain what the code does and how it works.
(\(\star\)) Using the example of dice, write code that simulates the game of points for a coin flip. Use the starting information that the game is played to 3 and the current score is heads 2, tails 1. Simulate 10 games. The output of your code should be a vector of length 10 where each entry represents if heads won or if tails won. In your R Markdown lab write up, carefully document your code as you did in problem 1 above. [Note: don’t reuse variable names from the code block above, come up with new variable names for your simulation here. This avoids confusion for anyone reading your code.]
(\(\star\)) Change the initial conditions in your simulation in a variety of ways, by altering the number of flips needed to win and/or the current score (do this in at least three different ways). Display the results as a table of proportions of times heads wins and tails wins. Write explanations for the observations you make about the resulting proportions.