In this lab we will:
Lab Objective 1: Write and organize statistical
reports in a clear readable format. Lab Objective 2:
Use and apply statistical structures for handling data.
Lab Objective 3: Analyze and display numeric
data.
Lab Objective 4: Learn to compute and interpret
probabilities of discrete variables.
Let \(p_n\) be the proportion of a certain outcomes occurring in \(n\) trails in which that outcome may occur, and let \(p\) be the theoretical probability of that outcome. The Law of Large Numbers states that as \(n\) approaches infinity, \(p_n\) converges to (approaches) \(p\).
Let’s test this out!
The following code runs a simulation of flipping a coin 10 times
using the sample() function. Let the outcome 1
represent heads and the outcome 0 represent tails.
set.seed(1) # This sets the seed for the random number generator (RNG) state
outcomes = c(0,1) # 0 = Tails, 1 = Heads
sam = sample(x = outcomes, size = 10, replace=T) # Samples randomly from `outcomes` size times
sam
## [1] 0 1 0 0 1 0 0 0 1 1
Lets increase the size to 100.
set.seed(1) # This sets the seed for the random number generator (RNG) state
outcomes = c(0,1) # 0 = Tails, 1 = Heads
sam = sample(x = outcomes, size = 100, replace=T) # Samples randomly from `outcomes` size times
table(sam)
## sam
## 0 1
## 49 51
sum() function.)Lets increase the size to 10000.
set.seed(1) # This sets the seed for the random number generator (RNG) state
outcomes = c(0,1) # 0 = Tails, 1 = Heads
sam = sample(x = outcomes, size = 1000, replace=T) # Samples randomly from `outcomes` size times
table(sam)
## sam
## 0 1
## 502 498
What is \(\hat{p}_{1000}\), the proportion of times head is rolled out of the 10 samples?
What do you notice about how \(\hat{p}_n\) changes as n increases?
Below is a cool function for plotting the outcomes of a simulation.
#' Runs simulation of randomly selecting from a set of outcomes and plots how proportion of times a specific outcome occurs
#' outcomes = vector of possible outcomes
#' outcome = outcome in question
#' n = number of trials
#' p = theoretical probability of phat
#' seed = seed for random number generator (optional)
plot_sim = function( outcomes, outcome, n, p, seed = 1 ) {
set.seed(seed)
results = sample(x = outcomes, size = n, replace=T)
phat = c()
for ( i in 1:n) {
phat[i] = length( which(results[1:i]==outcome) )/i
}
plot(1:n, phat, type='l',col='blue', xlab = "n",log='x', ylim = c(0,1))
abline(a=p,b=0, lty = "dotted")
}
Let’s test it out for \(n=100\)
plot_sim( outcomes=c(0,1), outcome = 1, n=100, p=0.5)
Run plot_sim for \(n=10000\). What do you observe?
Do the results of this simulation appear follow the Law of Large number? Why or why not?
Now let’s simulate rolling a six-sided dice. Below we have
plot_sim function used run a simulation of rolling a dice
10,000 times and plot the proportion of times a 1 is rolled.
plot_sim( outcomes=c(1,2,3,4,5,6), outcome = 1, n=10000, p=1/6)
Two events are independent if knowing the outcome of one provides no useful information about the outcome of the other (i.e. the outcome of one does not affect the probability of the other).
If events A and B are independent, then the probability of both \(A\) and \(B\) occurring simultaneously is
\[P(A \text{ and } B) = P(A) \cdot P(B)\]
where \(P(A \text{ and } B)\) is the probability of events \(A\) and \(B\) both occurring, \(P(A)\) is the probability of event A occurring, and \(P(B)\) is the probability of event \(B\) occurring.
What is the probability of:
flipping heads 2 times in a row.
flipping heads 3 times in a row.
flipping heads 10 times in a row?
Sometimes probabilities are easier to calculate if we look at their complement.
The complement of an event \(A\) is the event “\(A\) do not happen.” The notation \(\bar{A}\) or \(A^c\) is used for the complement of event \(A\). We can compute the probability of the complement using \(P(A^c) = 1 - P(A)\). (Notice also that complement of \(A^c\) is the original event \(A\), so that \(P(A) = 1 - P(A^c)\).)
Use: A = flipping at least one tail
A^C = ???
Suppose you flip a coin 3 times. What is the probability of flipping at least one tail?
Suppose you flip a coin 10 times. What is the probability of flipping at least one tail?
Expected value provides a way of evaluating the value of a decision of multiple outcomes.
Expected Value defined as the average gain or loss of an event if the procedure is repeated many times. We can compute the expected value by multiplying each outcome by the probability of that outcome, then adding up the products.
For example, if there are two possible outcomes of a decision, A and B, the expected value of each decision are V(A) and V(B) respectively (usually represented as monetary values), the expected value of the decision is:
Expected value = V(A)P(A) + V(B)P(B)
You purchase a raffle ticket to help out a charity. The raffle ticket costs $5. The charity is selling 2000 tickets. One of them will be drawn and the person holding the ticket will be given a prize worth $4000. Compute the expected value for this raffle.
# value of winning
v_win = 0
# probability of winning
p_win = 0
# value of losing
v_lose = 0
# probability of losing
p_lose = 0
#Expected Value
v_win*p_win + v_lose*p_lose
## [1] 0
Check: You should get an expected value of -$3. On average, each person is giving about $3.00 to charity.
The probability the event B occurs, given that event A has happened is represented by \(P(B|A)\), read “the probability of B given A.”
Conditional probabilities can be used to find the probability of joint events, even when they are not independent:
\[P(A \text{ and } B) = P(A|B) \cdot P(B)\]
This can be solved for
\[ P(A|B) = \frac{ P(A \text{ and } B) }{P(B)}\] 15. From the Machine Learning (ML) example (Figure 3.12 in your textbook, also printed in your Lab 4 Canvas assignment):
# Need to find P(ML is pred_fashion | truth is fashion)
# A = ML is pred_fashion
# B = truth is fashion
#P(A|B)
#Check
197/309
## [1] 0.6375405
Find the probability that the ML prediction was correct, given that the photo was not about fashion.
Find the probability that the ML prediction was wrong, given that the photo was about fashion.
Find the probability that the ML prediction was wrong, given that the photo was not about fashion.
In which case is the ML prediction the most accurate?