#Make function to simulate two dice rolls
TwoDiceRolls <- function(){
Dice <- sample (6,size = 2,replace = TRUE)
return(sum(Dice))
}
Repeats <- replicate(1000000, TwoDiceRolls())
TwoRollProbs <- round(table(Repeats)/length(Repeats),4)
TwoRollProbs
## Repeats
## 2 3 4 5 6 7 8 9 10 11
## 0.0278 0.0558 0.0837 0.1109 0.1389 0.1669 0.1383 0.1107 0.0834 0.0557
## 12
## 0.0279
If you roll a pair of fair dice, what is the probability of..
TwoRollProbs["1"][1]
## <NA>
## NA
NA signifies there is no way to yield a sum of 1, so the probability is: 0
TwoRollProbs["5"][1]
## 5
## 0.1109
TwoRollProbs["12"][1]
## 12
## 0.0279
Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% miss 2 days, and 28% miss 3 or more days due to sickness.
P_1 <- 0.25
P_2 <- 0.15
P_3P <- 0.28
P_0 <- (1 - P_1 - P_2 - P_3P)
P_0
## [1] 0.32
P_nm1 <- P_0 + P_1
P_nm1
## [1] 0.57
P_al1 <- 1 - P_0
P_al1
## [1] 0.68
P_both0 <- P_0 * P_0
P_both0
## [1] 0.1024
P_bothAL1 <- P_al1 * P_al1
P_bothAL1
## [1] 0.4624
The fist assumption is reasonable due to the fact that the DeKalb County elementary school system likely contains a very large number of students and the probability of each attendance record would not be significantly affected by removing one individual from the calculation. The second is a little more shakey since children in the same household are more likely to have similar health statistics as opposed to random individuals.
The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey designed to identify risk factors in the adult population and report emerging health trends. The following table displays the distribution of health status of respondents to this survey (excellent, very good, good, fair, poor) and whether or not they have health insurance.
mat = matrix(c(.023, 0.0364, 0.0427, 0.0192, 0.0050, 0.2099, 0.3123, 0.2410, 0.0817, 0.0289), byrow = TRUE, nrow = 2)
colnames(mat) = c("Excellent", "Very Good", "Good", "Fair", "Poor")
rownames(mat) = c("No Coverage", "Coverage")
mat
## Excellent Very Good Good Fair Poor
## No Coverage 0.0230 0.0364 0.0427 0.0192 0.0050
## Coverage 0.2099 0.3123 0.2410 0.0817 0.0289
ExcCovered <- mat["Coverage","Excellent"]
if(ExcCovered > 0){
print("Being in excellent health and having health coverage are not mutually exclusive")
}else { if(ExcCovered == 0){
print("Being in excellent health and having health coverage are mutually exclusive")
}
}
## [1] "Being in excellent health and having health coverage are not mutually exclusive"
Excellent <- sum(mat[,"Excellent"])
Excellent
## [1] 0.2329
Covered <- mat["Coverage",]
Covered["Excellent"]/sum(Covered)
## Excellent
## 0.2402152
NotCovered <- mat["No Coverage",]
NotCovered["Excellent"]/sum(NotCovered)
## Excellent
## 0.1821061
if (mat["Coverage","Excellent"] == (sum(Covered) * sum(Excellent))){
print ("Having excelent health and having health coverage appear to be independent")
} else {
print ("Having excelent health and having health coverage do not appear to be independent")
}
## [1] "Having excelent health and having health coverage do not appear to be independent"
Edison Research gathered exit poll results from several sources for the Wisconsin recall election of Scott Walker. They found that 53% of the respondents voted in favor of Scott Walker. Additionally, they estimated that of those who did vote in favor for Scott Walker, 37% had a college degree, while 44% of those who voted against Scott Walker had a college degree. Suppose we randomly sampled a person who participated in the exit poll and found that he had a college degree. What is the probability that he voted in favor of Scott Walker?
P_SW = 0.53
P_nSW = 1 - P_SW
P_SW_C = 0.37
P_nSW_C = 0.44
Poll = matrix(c(P_SW * P_SW_C, P_nSW * P_nSW_C, P_SW * (1-P_SW_C), P_nSW * (1 - P_nSW_C)), byrow = TRUE, nrow = 2)
colnames(Poll) = c("Scott Walker", "Not Scott Walker")
rownames(Poll) = c("College", "No College")
Poll
## Scott Walker Not Scott Walker
## College 0.1961 0.2068
## No College 0.3339 0.2632
Colls <- sum(Poll["College",])
Poll["College","Scott Walker"]/Colls
## [1] 0.4867213
The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
mymat2=matrix(c(13,59,15,8),nrow=2,byrow=TRUE)
colnames(mymat2)=c("hard","paper")
rownames(mymat2)=c("fiction","nonfiction")
mymat2
## hard paper
## fiction 13 59
## nonfiction 15 8
HardProb <- sum(mymat2[,"hard"])/sum(mymat2)
PapProb_2 <- sum(mymat2[,"paper"])/(sum(mymat2)-1)
Prob5a <- HardProb * PapProb_2
Prob5a
## [1] 0.2100784
#Probability of pulling a fiction book first
FicProb <- sum(mymat2["fiction",])/sum(mymat2)
#Probability that the fiction book pulled is a hardcover
Hard_Fic_Prob <- mymat2["fiction","hard"]/sum(mymat2["fiction",])
#Probability that the fiction book pulled is a paperback
Pap_Fic_Prob <- (1 - Hard_Fic_Prob)
#Probability of pulling a hardcover second having pulled the hardcover fiction book first
Hard_Fic_Hard_Prob <- (sum(mymat2[,"hard"])-1)/(sum(mymat2)-1)
#Probability of pulling a hardcover having pulled the paperback fiction book first
Pap_Fic_Hard_Prob <- (sum(mymat2[,"hard"]))/(sum(mymat2)-1)
#Overall probability accounts for both the case where the first book pulled was a hardcover and a paperback
Prob5b <- FicProb * (Hard_Fic_Prob * Hard_Fic_Hard_Prob + Pap_Fic_Prob * Pap_Fic_Hard_Prob)
Prob5b
## [1] 0.2243001
FicProb * HardProb
## [1] 0.2233795
The difference between (b) and (c) stems from the fact that the probability of pulling out a hardcover after having taken out a fiction book (27/94 or 28/94 depending on whether the fiction book pulled out was harcover or not) changes from the baseline probability of pulling a hardcover (28/95). The difference isn’t significant since the total number of books (95) is high enough that a change in the amount of books by 1 does not significantly affect the probabilities involved.
Andy is always looking for ways to make money fast. Lately, he has been trying to make money by gambling. Here is the game he is considering playing: The game costs 2 dollars to play. He draws a card from a deck. If he gets a number card (2-10), he wins nothing. For any face card (jack, queen or king), he wins 3 dollars. For any ace, he wins 5 dollars and he wins an extra $20 if he draws the ace of clubs. ### 6a) Create a probability model and find Andy’s expected profit per game.
TotCards <- 52
NumCards <- 36
FaceCards <- 12
BaseAces <- 3
ClubAce <- 1
GameCost <- 2
NumWin <- 0
FaceWin <- 3
BaseAceWin <- 5
ClubAceWin <- 20
NumChance <- NumCards / TotCards
FaceChance <- FaceCards / TotCards
BaseAceChance <- BaseAces / TotCards
ClubAceChance <- ClubAce / TotCards
Winnings <- function(){
Win <- NumChance * NumWin + FaceChance * FaceWin + BaseAceChance * BaseAceWin + ClubAceChance * ClubAceWin
return(Win - GameCost)
}
Winnings()
## [1] -0.6346154
I would not recommend that Andy play this game, as the expected winnings are negative, meaning he would be expected to lose money overall.
Ice cream usually comes in 1.5 quart boxes (48 fluid ounces), and ice cream scoops hold about 2 ounces. However, there is some variability in the amount of ice cream in a box as well as the amount of ice cream scooped out. We represent the amount of ice cream in the box as X and the amount scooped out as Y . Suppose these random variables have the following means, standard deviations, and variances:
mymat3=matrix(c(48,1,1, 2,.25,.0625), nrow=2, byrow=TRUE)
colnames(mymat3)=c("mean", "SD", "Var")
rownames(mymat3)=c("X, In Box","Y, Scooped")
mymat3
## mean SD Var
## X, In Box 48 1.00 1.0000
## Y, Scooped 2 0.25 0.0625
BoxOz <- 48
ScoopOz <- 2
Served <- BoxOz + ScoopOz *3
Served
## [1] 54
ServedStDv <- sqrt(1+ 0.0625 *3)
ServedStDv
## [1] 1.089725
Left <- BoxOz - ScoopOz
Left
## [1] 46
LeftStDv <- sqrt (1 + 0.0625)
LeftStDv
## [1] 1.030776
## [1] "The direction in which the amounts change (increase or decrease) does not affect the effect of the variance of the final result; therefore whether we add or subtract a variable, the variance of the final amount increases."