In this lab, we are going to look at basic probability and how to conduct basic simulations using R.
With R we can play games of chance - say, rolling a die or flipping a coin. Why, you might ask? Well, R can flip coins and roll dice much faster than we can!
The main command we need to know for this is sample. It has two arguments and two options. Imagine drawing names from a hat. The command sample just picks a certain number of names from the hat. You have to tell R what hat to pick from and how many to pick.
R can sample with and without replacement. With replacement means we pick a name out of the basket and put it back. Without replacement means we pick a name out of the basket and we don't put it back.
So, say we want R to roll a six-sided die once. We are going to use a new format for writing a sequence of numbers, X:Y. This is just shorthand for make a list of numbers from 1 to 6 increasing by 1 each step. Take a look at this command:
1:6
## [1] 1 2 3 4 5 6
So, now we tell R to choose 1 number from the numbers 1:6 and spit it back to us. 1:6 is the “hat” and 1 is the number of picks.
sample(1:6, 1)
## [1] 5
If we want R to roll a die 10 times, we need to tell R to do it with replacement. With replacement means pick a number from 1-6 from the hat. Put the number back in the hat, and pick again. This means our rolls are independent of one another.
sample(1:6, 10, replace = T)
## [1] 5 6 3 2 4 3 2 2 2 5
Below, I'm going to make a new command. Don't worry about the syntax, unless you want to know how functions are defined in R. The result of running this code is that R has a new command called Roll1Die().You pick the number of rolls and put that as the argument to your command.
Roll1Die = function(n) sample(1:6, n, rep = T)
# Now I have a new function that will roll one die n number of times. We use
# this command as follows:
Roll1Die(10)
## [1] 1 6 4 1 5 1 6 5 5 5
How about coin flips? We can do the same thing, but now ask R to choose between heads and tails.
n = 10
sample(c("Heads", "Tails"), n, rep = T)
## [1] "Heads" "Heads" "Tails" "Heads" "Heads" "Tails" "Tails" "Heads"
## [9] "Tails" "Heads"
Flip1Coin = function(n) sample(c("Heads", "Tails"), n, rep = T)
Flip1Coin(n)
## [1] "Tails" "Tails" "Heads" "Heads" "Heads" "Tails" "Tails" "Heads"
## [9] "Tails" "Heads"
We can now do some experiments! Let's flip a coin 1,000 times and count the number of heads.
C = Flip1Coin(1000)
# Count them up. This is done with sum. If we say heads are 1 and tails are
# 0, then:
sum(C == "Heads") #number of heads
## [1] 489
sum(C == "Tails") #number of tails
## [1] 511
sum(C != "Heads") #number of tails
## [1] 511
Now we look at a table:
table(C)
## C
## Heads Tails
## 489 511
prop.table(table(C))
## C
## Heads Tails
## 0.489 0.511
For loops: A handy thing to use when simulating is a for loop. A loop just does a think over and over again while some condition is true. If we want to print the numbers 1 to 100, we could write:
for (i in 1:100) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
## [1] 51
## [1] 52
## [1] 53
## [1] 54
## [1] 55
## [1] 56
## [1] 57
## [1] 58
## [1] 59
## [1] 60
## [1] 61
## [1] 62
## [1] 63
## [1] 64
## [1] 65
## [1] 66
## [1] 67
## [1] 68
## [1] 69
## [1] 70
## [1] 71
## [1] 72
## [1] 73
## [1] 74
## [1] 75
## [1] 76
## [1] 77
## [1] 78
## [1] 79
## [1] 80
## [1] 81
## [1] 82
## [1] 83
## [1] 84
## [1] 85
## [1] 86
## [1] 87
## [1] 88
## [1] 89
## [1] 90
## [1] 91
## [1] 92
## [1] 93
## [1] 94
## [1] 95
## [1] 96
## [1] 97
## [1] 98
## [1] 99
## [1] 100
This will allow us to repeat experiments many times and record the result. For instance. Say we want to roll 2 dice and record the numbers; then we want to repeat this 100 times. Here's how we do it.
roll1 = NULL #This initializes our variable - i.e. it creates a spot in memory for it. We need to do this for any vector, table, matrix, dataframe, but not for single numbers
roll2 = NULL
for (i in 1:100) {
roll1[i] = Roll1Die(1)
roll2[i] = Roll1Die(1)
}
# We can ask how many doubles we came up with:
sum(roll1 == roll2)
## [1] 19
# We can ask for the sum of the two rolls. The breaks=1:12 says we want our
# bins to start at 1, end at 12, and to be 1 unit wide. Bin 1 will be all
# the numbers greater than 1 and upto and including 2, Bin 2 will be all the
# numbers greater than 3 and up to and including 3 and so on.
hist((roll1 + roll2), density = 100, breaks = 1:12, prob = T)
barplot(table(roll1 + roll2), main = "2 Dice Sum, 100 Rolls") #this works better for this case
# What if we roll the dice 10000 times, what would the histogram look like?
roll1 = NULL
roll2 = NULL
for (i in 1:10000) {
roll1[i] = Roll1Die(1)
roll2[i] = Roll1Die(1)
}
barplot(table(roll1 + roll2), density = 100, main = "2 Dice Sum, 10000 Rolls")
# Looks pretty similar to what we calculated, right?
Exercise 1: A) If we flip a coin, what is the expected probability of getting a head? If we flip a coin 10 times, what is the expected number of heads?
B) Have R flip a coin 10 times and count the number of heads. Repeat this 8 times and store the number of heads for each one.
C) Have R flip a coin 10 times, count the number of heads, store the number and repeat 1000 times.
D) How do the results from the experiment in B differ from the results in experiment C? Why?
Exercise 2: A friend proposes that you play a game with dice. You will roll 2 die. If you roll a 7, your friend will pay you $3; if you roll an 11, your friend will pay you $5. If you roll any other combination, you have to pay $0.70
A) What is the probability of rolling a seven?
B) What is the probability of rolling and eleven?
C) What is the probability of rolling a seven or an eleven?
D) Simulate rolling 2 dice using the Roll1Die() function. Simulate rolling 2 dice 100 times and store the results. Calculate A, B, and C from the data.
E) HARD QUESTION: DON'T WORRY IF YOU DON'T GET IT!!! You play the game 10 times and win $30. You think, boy this game is easy to win at! I should keep playing! Is this the correct assumption? Prove it with a simulation.
F) Your friend offers to drop the amount you have to pay to $.68. Should you accept this?