Problem 1
Simulate the Monty Hall game in R using the sample() function. Play the game 10,000 times never switching and record how often you win. Then play the game another 10,000 times always switching and record how often you win. Which strategy is better?
doors = 3
games = 10000
# Simulating the random door that the car is behind.
montyHall = sample(doors, games, replace = TRUE)
# Simulating the random player choice.
choice = sample(doors, games, replace = TRUE)
# This is the strategy of picking a door and not switching, as it just checks
# to see if the player's choice matches the door that the car is behind.
winA = 0 #Number of wins for this strategy, in this simulation.
for (i1 in 1:games)
{
if (montyHall[i1] == choice[i1])
{
winA = winA + 1
}
}
# This is the strategy of picking a door and switching,
winB = 0 #Number of wins for this strategy, in this simulation.
for (j1 in 1:games)
{
doorA = 0
doorB = 0
doorC = 0
# Determining which doors can be duds (goats)
if (montyHall[j1] == 1) # If car is behind door 1...
{
if (choice[j1] == 2){ # If car is behind door 1, and player chose door 2,
doorC = 1 # Door 3 must be the dud
}
else if (choice[j1] == 3){ # If car is behind door 1, and player chose door 3,
doorB = 1 # Door 2 must be the dud
}
else{ # If car is behind door 1, and player chose door 1,
doorB = 0.5 # Simulate a 50/50 to show a dud door
doorC = 0.5
}
}
else if (montyHall[j1] == 2) # If car is behind door 2...
{
if (choice[j1] == 1){
doorC = 1
}
else if (choice[j1] == 3){
doorA = 1
}
else{
doorA = 0.5
doorC = 0.5
}
}
else # If car is behind door 3...
{
if (choice[j1] == 1){
doorB = 1
}
else if (choice[j1] == 2){
doorA = 1
}
else{
doorA = 0.5
doorB = 0.5
}
}
# Selecting the guaranteed goat (built for 3 doors only)
goat = sample(doors, 1, prob = c(doorA, doorB, doorC))
# Formula for deciding which door to switch to (bulit for 3 doors only)
# Takes away the player's choice and the guaranteed goat from the sum of all doors, leaving the door to swap to
newDoor = 6 - goat - choice[j1]
if (montyHall[j1] == newDoor)
{
winB = winB + 1
}
}
print(winA)
[1] 3352
print(winB)
[1] 6648
The second (switching) strategy wins about twice as much as the first (staying) strategy.
Problem 2
NO FORECAST:
u <- function(a,x){return ((-5*(a-x)*(x<=a)) + (-35*(x-a)*(x>a)))}
p <- c(4,15,35,5,5,5,5,20,3,3)/100
milksum = 0
for(ordered in 1:10)
{
for (used in 1:10)
{
milksum = milksum + p[used] * u(ordered, used)
}
print(paste(c(ordered, "ordered gallons: Expected Utility =", milksum), collapse = " "))
milksum = 0
}
[1] "1 ordered gallons: Expected Utility = -128.1"
[1] "2 ordered gallons: Expected Utility = -94.7"
[1] "3 ordered gallons: Expected Utility = -67.3"
[1] "4 ordered gallons: Expected Utility = -53.9"
[1] "5 ordered gallons: Expected Utility = -42.5"
[1] "6 ordered gallons: Expected Utility = -33.1"
[1] "7 ordered gallons: Expected Utility = -25.7"
[1] "8 ordered gallons: Expected Utility = -20.3"
[1] "9 ordered gallons: Expected Utility = -22.9"
[1] "10 ordered gallons: Expected Utility = -26.7"
RAINY FORECAST:
p2 <- c(0,0,0,0,0,1,2,15,1,1)/20
milksum2 = 0
for(ordered2 in 1:10)
{
for (used2 in 1:10)
{
milksum2 = milksum2 + p2[used2] * u(ordered2, used2)
}
print(paste(c(ordered2, "ordered gallons: Expected Utility =", milksum2), collapse = " "))
milksum2 = 0
}
[1] "1 ordered gallons: Expected Utility = -243.25"
[1] "2 ordered gallons: Expected Utility = -208.25"
[1] "3 ordered gallons: Expected Utility = -173.25"
[1] "4 ordered gallons: Expected Utility = -138.25"
[1] "5 ordered gallons: Expected Utility = -103.25"
[1] "6 ordered gallons: Expected Utility = -68.25"
[1] "7 ordered gallons: Expected Utility = -35.25"
[1] "8 ordered gallons: Expected Utility = -6.25"
[1] "9 ordered gallons: Expected Utility = -7.25"
[1] "10 ordered gallons: Expected Utility = -10.25"
Ordering 8 gallons is optimal for both forecasted rainy days and days where the weather is unknown. However, the best expected utility is much more concentrated around 8 on rainy days, rather than unknown days where it would be similar to order 7 or 9 gallons.
Problem 3
Answer the following statements TRUE or FALSE, providing a succinct explanation of your reasoning.
- You roll two fair three-sided dice. The probability the two dice show the same number is 1/3.
TRUE
P(dice show same number) = 3 different numbers * P(any individual number is shown on both dice)
P(dice show same number) = 3 * 1/9 = 3/9 = 1/3
- If events A and B are independent and P(A) > 0 and P(B) > 0, then P(A and B) > 0.
TRUE
Since A and B are independent, P(A and B) = P(A)P(B)
P(A)P(B) > 0, therefore P(A and B) > 0
- If two events A and B are independent and P(A) > 0 and P(B) > 0, then A and B cannot be mutually exclusive.
TRUE
If two events are independent, they cannot be mutually exclusive. Mutually exclusive means the events cannot happen at the same time–that would make the events dependent.
- If P(A and B) ≥ 0.4, then P(A) ≤ 0.40.
FALSE
The contrapositive can be disproven by a counterexample. The contrapositive is: “If P(A) > 0.40, then P(A and B) < .4”. Say that P(A) = .41 > .40, and P(B) = 1. Assuming the events are independent, P(A and B) = P(A)P(B) = .41, which is not < .4. Since the contrapositive is disproven, the statement is also disproven.
Problem 4

Problem 5

Problem 6

Problem 7

Problem 8

Problem 9

Problem 10

Problem 11

Problem 12
- Find the mean payoffs of the two different drilling strategies.
Standard: .2(20) + .5(30) + .3(50) = 4 + 15 + 15 = $34 million
Horizontal: .2(-20) + .5(40) + .3(90) = -4 + 20 + 27 = $43 million
- Find the variance in payoffs of each strategy.
Var(X) = E(X^2) - (E(X))^2
Standard: E(X^2) = .2(400) + .5(900) + .3(2500) = 80 + 450 + 750 = 1280
V(Standard) = 1280 - (34)^2 = 124 million dollars^2
Horizontal: E(X^2) = .2(400) + .5(1600) + .3(8100) = 80 + 800 + 2430 = 3310
V(Horizontal) = 3310 - (43)^2 = 1,461 million dollars^2
- Which strategy would you advocate for and why?
I would advocate for Standard Drilling, as the variance for Horizontal Drilling is way too high. The units of measurement are millions of dollars, which means that a much higher variance is just way too costly if drilling were to go poorly.
- How much are you willing to pay for a geological evaluation that would determine with certainty the quantity of oil at the site prior to drilling?
Maximum you would pay is $4.5 million, as that is the difference in expected values of the two different types of drilling, divided by the number of types of drilling, 2. However, then you would be expected to make no profit, so you should pay slightly less, such as $4 million.
Problem 13

Problem 14
- Find the probability of a miscarriage, given that you are a low risk individual. To figure this out, write the overall probability of miscarriage using the law of total probability; then solve for P(M | L).
P(M) = P(M | L) * P(L) + P(M | not-L) * P(not-L)
.15 = P(M | L) * (.98) + .80(.02)
P(M | L) = (.15 - .016)/.98
P(M | L) ≈ .1367 = 13.67%
- Use Bayes rule to calculate the probability of being a high risk individual, given that you had a first miscarriage.
P(not-L | M) = (P(M | not-L) * P(not-L))/P(M)
P(not-L | M) = (.80*.02)/.15
P(not-L | M) ≈ .1067 = 10.67%
- Now that you know P(not-L | M) (from above), apply the law of total probability again to determine the probability of a second miscarriage. You may assume that, conditional on mother type (high or low risk), subsequent miscarriages are independent.
P(M2) = P(M | L) * P(L) * P(L | M) + P(M | not-L) * P(not-L) * P(not-L | M)
P(M2) ≈ .1367 * .98 * .8933 + .80 * .02 * .1067
P(M2) ≈ .1197 + .0017 =
.1214 = 12.14%
Problem 15
A friend claims she can tell the difference between Evian and Dasani bottled water brands. To test her, you give her a blind taste test at lunch seven days in a row, where you randomly pick a bottle and have her try it blindfolded and she attempts to identify the brand. She gets a correct classification on five of the seven days.
- To evaluate the strength of this evidence towards proving her claim, assume to the contrary that the probability of her getting a correct classification is p = 1/2. What is the probability of her getting five correct in seven tries under this assumption?
Binomial Distribution: P(5 of 7) = (7 choose 5) * (1/2)^5 * (1/2)^2 = 21 * (1/128) ≈ .1641
The probability is about 16.41%
- What is the probability of getting five or more classifications correct, again under the assumption that she really is unable to tell the difference (p = 1/2)?
print(1 - pbinom(4,7,.5))
[1] 0.2265625
The probability is about 22.66%
- Now, suppose you run the experiment an additional three weeks, for a total of 28 tests. Suppose she gets a total of 20 out of 28 correct. Again assuming p = 1/2, what is the probability of getting this many or more classifications correct? (You can use the R command pbinom(y,n,p) to compute your answer, where n is the number of trials, y is the number of successes in n trials with success probability p.)
print(1 - pbinom(19,28,.5))
[1] 0.01784907
The probability is about 01.78%
- Explain in words why the answer to the previous two questions are different, even though in both cases your friend gets 5/7 = 20/28 ≈ 71.4% of her guesses correct. What probability idea does this illustrate?
As the number of trials increases, the experimental probability should trend towards reflecting the true probability. The variance should decrease, but the mean should stay the same. This reflects the Law of Large Numbers
Problem 16
Problem 17
85%. The student should not fall victim to the Gambler’s Paradox, and their expected score should not change.
Problem 18
- What is the probability I make my first three putts but miss my fourth?
P(WWWL) = (.7)^3 * (.3) = .1029 = 10.29%
- What is the probability I make exactly three putts in my first four attempts? Why is this answer different from the previous one?
P(X = 3) = (4 3) * ((P(W))^3) * (1 - P(W))^1 = .4116 = 41.16%
The probability is exactly 4 times larger because the one miss can happen in any of the four putts instead of just the last one.
- What is the probability that my brother makes more than half of his first four putts?
P(Y > 2) = (4 3) * ((P(W))^3) * (1 - P(W))^1 + (4 4) * ((P(W))^4) = 4 * (.8)^3 * (.2) + (.8)^4
P(Y > 2) = .4096 + .4096 = .8192 = 81.92%
- The game calls for us to each attempt 100 putts, after which we settle up at the price of $5 per putt. What is the expected amount I will owe my brother?
For: 100 putts,
E(Y - X) = E(Y) - E(X) = 100(.8) - 100(.7) = 10
On average, he will make 10 more puts per 100 than you, and you will owe him $50.
- Use the normal approximation to the binomial distribution (applied twice), as well as the fact that linear combinations of normal distributions are again normally distributed, to find the probability that I will win. Determine the answer approximately using the “empirical rule”; compare this to the exact answer found using the pnorm(y,mu,sigma) command in R.
Problem 19

Problem 20
The p-value is small, coming in at <.00001. This was based on a one-sided interval, as the shift in headache timing was not trying to get in a certain range of acceptable times. Rather, it was just testing a one-sided shift from placebo.
As long as my previous answer was correct, the new drug is statistically significantly better than placebo , since p < .00001, which is < .05.
I would recommend collecting more data to test the drug’s efficacy. A sample size of 100 is too small to judge the efficacy (and also the safety!) of a drug that may be given to millions of patients or over-the-counter customers. The same study needs to be performed with a much larger group.
I would NOT proceed with the development of the drug, as stated in the previous answer. Additional tests must be done with more people, perhaps more than once, which should reduce the variability of the experiment.
Problem 21
The scientists are misusing the idea of statistical significance to claim that jelly beans are linked to acne. While almost all of the test variables concluded that there was no link, they used one statistically significant factor to prove a potentially false hypothesis that they had.
Problem 22

Problem 23
Problem 24
Problem 25
Problem 26
Problem 27
Problem 28
library(glmnet)
data(iris)
x = as.matrix(iris[,-5])
y = iris[,5] == "versicolor"
cvob1 = cv.glmnet(x,y,family = "binomial", type.measure = "class", alpha = 1, nfolds = 5)
plot(cvob1)

y = iris[,5] == "setosa"
cvob1 = cv.glmnet(x,y,family = "binomial", type.measure = "class", alpha = 1, nfolds = 5)
plot(cvob1)

y = iris[,5] == "virginica"
cvob1 = cv.glmnet(x,y,family = "binomial", type.measure = "class", alpha = 1, nfolds = 5)
plot(cvob1)

