Data_Science_Math_HW2

Problem 1. Dice Rolls

#Make function to simulate two dice rolls
TwoDiceRolls <- function(){
  Dice <- sample (6,size = 2,replace = TRUE)
  return(sum(Dice))
}
Repeats <- replicate(1000000, TwoDiceRolls())
TwoRollProbs <- round(table(Repeats)/length(Repeats),4)
TwoRollProbs

## Repeats
##      2      3      4      5      6      7      8      9     10     11 
## 0.0278 0.0558 0.0837 0.1109 0.1389 0.1669 0.1383 0.1107 0.0834 0.0557 
##     12 
## 0.0279

If you roll a pair of fair dice, what is the probability of..

1a) getting a sum of 1?

TwoRollProbs["1"][1]

## <NA> 
##   NA

NA signifies there is no way to yield a sum of 1, so the probability is: 0

1b) getting a sum of 5?

TwoRollProbs["5"][1]

##      5 
## 0.1109

1c) getting a sum of 12?

TwoRollProbs["12"][1]

##     12 
## 0.0279

Problem 2. School Absences

Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% miss 2 days, and 28% miss 3 or more days due to sickness.

P_1 <- 0.25
P_2 <- 0.15
P_3P <- 0.28

2a) What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?

P_0 <- (1 - P_1 - P_2 - P_3P)
P_0

## [1] 0.32

2b) What is the probability that a student chosen at random misses no more than one day?

P_nm1 <- P_0 + P_1
P_nm1

## [1] 0.57

2c) What is the probability that a student chosen at random misses at least one day?

P_al1 <- 1 - P_0
P_al1

## [1] 0.68

2d) If a parent has two kids at a DeKalb County elementary school, what is the probability that neither kid will miss any school? Note any assumption you make.

Assuming that

1) The probabilities of the second child’s attendance aren’t affected by having included the first in the calculations

2) The fact that the two children are in the same household doesn’t affect their attendace record with respect to the overall statistics

P_both0 <- P_0 * P_0
P_both0

## [1] 0.1024

2e) If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at least one day? Note any assumption you make.

Assuming that

1) The probabilities of the second child’s attendance aren’t affected by having included the first in the calculations

2) The fact that the two children are in the same household doesn’t affect their attendace record with respect to the overall statistics

P_bothAL1 <- P_al1 * P_al1
P_bothAL1

## [1] 0.4624

2f) If you made an assumption in part (d) or (e), do you think it was reasonable?

The fist assumption is reasonable due to the fact that the DeKalb County elementary school system likely contains a very large number of students and the probability of each attendance record would not be significantly affected by removing one individual from the calculation. The second is a little more shakey since children in the same household are more likely to have similar health statistics as opposed to random individuals.

Problem 3. Health Coverage, Relative Frequencies

The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey designed to identify risk factors in the adult population and report emerging health trends. The following table displays the distribution of health status of respondents to this survey (excellent, very good, good, fair, poor) and whether or not they have health insurance.

mat = matrix(c(.023, 0.0364, 0.0427, 0.0192, 0.0050, 0.2099, 0.3123, 0.2410, 0.0817, 0.0289), byrow = TRUE, nrow = 2)
colnames(mat) = c("Excellent", "Very Good", "Good", "Fair", "Poor")
rownames(mat) = c("No Coverage", "Coverage")
mat

##             Excellent Very Good   Good   Fair   Poor
## No Coverage    0.0230    0.0364 0.0427 0.0192 0.0050
## Coverage       0.2099    0.3123 0.2410 0.0817 0.0289

3a) Are being in excellent health and having health coverage mutually exclusive?

ExcCovered <- mat["Coverage","Excellent"]
if(ExcCovered > 0){
  print("Being in excellent health and having health coverage are not mutually exclusive")
}else { if(ExcCovered == 0){
  print("Being in excellent health and having health coverage are mutually exclusive")
  }
}

## [1] "Being in excellent health and having health coverage are not mutually exclusive"

3b) What is the probability that a randomly chosen individual has excellent health?

Excellent <- sum(mat[,"Excellent"])
Excellent

## [1] 0.2329

3c) What is the probability that a randomly chosen individual has excellent health given that he has health coverage?

Covered <- mat["Coverage",]
Covered["Excellent"]/sum(Covered)

## Excellent 
## 0.2402152

3d) What is the probability that a randomly chosen individual has excellent health given that he doesn’t have health coverage?

NotCovered <- mat["No Coverage",]
NotCovered["Excellent"]/sum(NotCovered)

## Excellent 
## 0.1821061

3e) Do having excellent health and having health coverage appear to be independent?

if (mat["Coverage","Excellent"] == (sum(Covered) * sum(Excellent))){
print ("Having excelent health and having health coverage appear to be independent")
} else {
print ("Having excelent health and having health coverage do not appear to be independent")
}

## [1] "Having excelent health and having health coverage do not appear to be independent"

Problem 4. Exit Poll

Edison Research gathered exit poll results from several sources for the Wisconsin recall election of Scott Walker. They found that 53% of the respondents voted in favor of Scott Walker. Additionally, they estimated that of those who did vote in favor for Scott Walker, 37% had a college degree, while 44% of those who voted against Scott Walker had a college degree. Suppose we randomly sampled a person who participated in the exit poll and found that he had a college degree. What is the probability that he voted in favor of Scott Walker?

P_SW = 0.53
P_nSW = 1 - P_SW
P_SW_C = 0.37
P_nSW_C = 0.44

Poll = matrix(c(P_SW * P_SW_C, P_nSW * P_nSW_C, P_SW * (1-P_SW_C), P_nSW * (1 - P_nSW_C)), byrow = TRUE, nrow = 2)
colnames(Poll) = c("Scott Walker", "Not Scott Walker")
rownames(Poll) = c("College", "No College")

Poll

##            Scott Walker Not Scott Walker
## College          0.1961           0.2068
## No College       0.3339           0.2632

Colls <- sum(Poll["College",])
Poll["College","Scott Walker"]/Colls

## [1] 0.4867213

Problem 5. Books on a Bookshelf

The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.

mymat2=matrix(c(13,59,15,8),nrow=2,byrow=TRUE)
colnames(mymat2)=c("hard","paper")
rownames(mymat2)=c("fiction","nonfiction")
mymat2

##            hard paper
## fiction      13    59
## nonfiction   15     8

5a) Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

HardProb <- sum(mymat2[,"hard"])/sum(mymat2)
PapProb_2 <- sum(mymat2[,"paper"])/(sum(mymat2)-1)
Prob5a <- HardProb * PapProb_2
Prob5a

## [1] 0.2100784

5b) Determine the probability of drawing a fiction book first and then a hardcover book second when drawing without replacement.

#Probability of pulling a fiction book first
FicProb <- sum(mymat2["fiction",])/sum(mymat2)

#Probability that the fiction book pulled is a hardcover
Hard_Fic_Prob <- mymat2["fiction","hard"]/sum(mymat2["fiction",])

#Probability that the fiction book pulled is a paperback
Pap_Fic_Prob <- (1 - Hard_Fic_Prob)

#Probability of pulling a hardcover second having pulled the hardcover fiction book first
Hard_Fic_Hard_Prob <- (sum(mymat2[,"hard"])-1)/(sum(mymat2)-1)

#Probability of pulling a hardcover having pulled the paperback fiction book first
Pap_Fic_Hard_Prob <- (sum(mymat2[,"hard"]))/(sum(mymat2)-1)

#Overall probability accounts for both the case where the first book pulled was a hardcover and a paperback
Prob5b <- FicProb * (Hard_Fic_Prob * Hard_Fic_Hard_Prob + Pap_Fic_Prob * Pap_Fic_Hard_Prob)

Prob5b

## [1] 0.2243001

5c) Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.

FicProb * HardProb

## [1] 0.2233795

5d) The final answers to parts (b) and (c) are very similar. Explain why this is the case.

The difference between (b) and (c) stems from the fact that the probability of pulling out a hardcover after having taken out a fiction book (27/94 or 28/94 depending on whether the fiction book pulled out was harcover or not) changes from the baseline probability of pulling a hardcover (28/95). The difference isn’t significant since the total number of books (95) is high enough that a change in the amount of books by 1 does not significantly affect the probabilities involved.

Problem 6. Is it worth it?

Andy is always looking for ways to make money fast. Lately, he has been trying to make money by gambling. Here is the game he is considering playing: The game costs 2 dollars to play. He draws a card from a deck. If he gets a number card (2-10), he wins nothing. For any face card (jack, queen or king), he wins 3 dollars. For any ace, he wins 5 dollars and he wins an extra $20 if he draws the ace of clubs. ### 6a) Create a probability model and find Andy’s expected profit per game.

TotCards <- 52
NumCards <- 36
FaceCards <- 12
BaseAces <- 3
ClubAce <- 1

GameCost <- 2
NumWin <- 0
FaceWin <- 3
BaseAceWin <- 5
ClubAceWin <- 20

NumChance <- NumCards / TotCards
FaceChance <- FaceCards / TotCards
BaseAceChance <- BaseAces / TotCards
ClubAceChance <- ClubAce / TotCards

Winnings <- function(){
  Win <- NumChance * NumWin + FaceChance * FaceWin + BaseAceChance * BaseAceWin + ClubAceChance * ClubAceWin
  return(Win - GameCost)
}
Winnings()

## [1] -0.6346154

Problem 7. Scooping Ice Cream

Ice cream usually comes in 1.5 quart boxes (48 fluid ounces), and ice cream scoops hold about 2 ounces. However, there is some variability in the amount of ice cream in a box as well as the amount of ice cream scooped out. We represent the amount of ice cream in the box as X and the amount scooped out as Y . Suppose these random variables have the following means, standard deviations, and variances:

mymat3=matrix(c(48,1,1, 2,.25,.0625), nrow=2, byrow=TRUE)
colnames(mymat3)=c("mean", "SD", "Var")
rownames(mymat3)=c("X, In Box","Y, Scooped")
mymat3

##            mean   SD    Var
## X, In Box    48 1.00 1.0000
## Y, Scooped    2 0.25 0.0625

BoxOz <- 48
ScoopOz <- 2

7a) An entire box of ice cream, plus 3 scoops from a second box is served at a party. How much ice cream do you expect to have been served at this party?

Served <- BoxOz + ScoopOz *3
Served

## [1] 54

What is the standard deviation of the amount of ice cream served?

ServedStDv <- sqrt(1+ 0.0625 *3)
ServedStDv

## [1] 1.089725

7b) How much ice cream would you expect to be left in the box after scooping out one scoop of ice cream? That is, find the expected value of X ??? Y.

Left <- BoxOz - ScoopOz
Left

## [1] 46

What is the standard deviation of the amount left in the box?

LeftStDv <- sqrt (1 + 0.0625)
LeftStDv

## [1] 1.030776

7c) Using the context of this exercise, explain why we add variances when we subtract one random variable from another.

## [1] "The direction in which the amounts change (increase or decrease) does not affect the effect of the variance of the final result; therefore whether we add or subtract a variable, the variance of the final amount increases."

Data_Science_Math_HW2_Kollontai

Misha Kollontai

7/22/2019

Problem 1. Dice Rolls

1a) getting a sum of 1?

1b) getting a sum of 5?

1c) getting a sum of 12?

Problem 2. School Absences

2a) What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?

2b) What is the probability that a student chosen at random misses no more than one day?

2c) What is the probability that a student chosen at random misses at least one day?

2d) If a parent has two kids at a DeKalb County elementary school, what is the probability that neither kid will miss any school? Note any assumption you make.

Assuming that

1) The probabilities of the second child’s attendance aren’t affected by having included the first in the calculations

2) The fact that the two children are in the same household doesn’t affect their attendace record with respect to the overall statistics

2e) If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at least one day? Note any assumption you make.

Assuming that

1) The probabilities of the second child’s attendance aren’t affected by having included the first in the calculations

2) The fact that the two children are in the same household doesn’t affect their attendace record with respect to the overall statistics

2f) If you made an assumption in part (d) or (e), do you think it was reasonable?

Problem 3. Health Coverage, Relative Frequencies

3a) Are being in excellent health and having health coverage mutually exclusive?

3b) What is the probability that a randomly chosen individual has excellent health?

3c) What is the probability that a randomly chosen individual has excellent health given that he has health coverage?

3d) What is the probability that a randomly chosen individual has excellent health given that he doesn’t have health coverage?

3e) Do having excellent health and having health coverage appear to be independent?

Problem 4. Exit Poll

Problem 5. Books on a Bookshelf

5a) Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

5b) Determine the probability of drawing a fiction book first and then a hardcover book second when drawing without replacement.

5c) Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.

5d) The final answers to parts (b) and (c) are very similar. Explain why this is the case.

Problem 6. Is it worth it?

Problem 7. Scooping Ice Cream

7a) An entire box of ice cream, plus 3 scoops from a second box is served at a party. How much ice cream do you expect to have been served at this party?

What is the standard deviation of the amount of ice cream served?

7b) How much ice cream would you expect to be left in the box after scooping out one scoop of ice cream? That is, find the expected value of X ??? Y.

What is the standard deviation of the amount left in the box?

7c) Using the context of this exercise, explain why we add variances when we subtract one random variable from another.

Data_Science_Math_HW2_Kollontai

Misha Kollontai

7/22/2019

Problem 1. Dice Rolls

1a) getting a sum of 1?

1b) getting a sum of 5?

1c) getting a sum of 12?

Problem 2. School Absences

2a) What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?

2b) What is the probability that a student chosen at random misses no more than one day?

2c) What is the probability that a student chosen at random misses at least one day?

2d) If a parent has two kids at a DeKalb County elementary school, what is the probability that neither kid will miss any school? Note any assumption you make.

Assuming that

1) The probabilities of the second child’s attendance aren’t affected by having included the first in the calculations

2) The fact that the two children are in the same household doesn’t affect their attendace record with respect to the overall statistics

2e) If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at least one day? Note any assumption you make.

Assuming that

1) The probabilities of the second child’s attendance aren’t affected by having included the first in the calculations

2) The fact that the two children are in the same household doesn’t affect their attendace record with respect to the overall statistics

2f) If you made an assumption in part (d) or (e), do you think it was reasonable?

Problem 3. Health Coverage, Relative Frequencies

3a) Are being in excellent health and having health coverage mutually exclusive?

3b) What is the probability that a randomly chosen individual has excellent health?

3c) What is the probability that a randomly chosen individual has excellent health given that he has health coverage?

3d) What is the probability that a randomly chosen individual has excellent health given that he doesn’t have health coverage?

3e) Do having excellent health and having health coverage appear to be independent?

Problem 4. Exit Poll

Problem 5. Books on a Bookshelf

5a) Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

5b) Determine the probability of drawing a fiction book first and then a hardcover book second when drawing without replacement.

5c) Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.

5d) The final answers to parts (b) and (c) are very similar. Explain why this is the case.

Problem 6. Is it worth it?

6b) Would you recommend this game to Andy as a good way to make money? Explain.

Problem 7. Scooping Ice Cream

7a) An entire box of ice cream, plus 3 scoops from a second box is served at a party. How much ice cream do you expect to have been served at this party?

What is the standard deviation of the amount of ice cream served?

7b) How much ice cream would you expect to be left in the box after scooping out one scoop of ice cream? That is, find the expected value of X ??? Y.

What is the standard deviation of the amount left in the box?

7c) Using the context of this exercise, explain why we add variances when we subtract one random variable from another.