Homework 2
Problem 1. Dice Rolls
If you roll a pair of fair dice, what is the probability of..
dice1 <- 1:6
dice2 <- 1:6
sample.space <- expand.grid(dice1, dice2)
sample.space$total <- sample.space$Var1 + sample.space$Var2
head(sample.space)
## Var1 Var2 total
## 1 1 1 2
## 2 2 1 3
## 3 3 1 4
## 4 4 1 5
## 5 5 1 6
## 6 6 1 7
sum(sample.space$total == 1)/length(sample.space$total)
## [1] 0
The probability of getting 1 is 0.
sum(sample.space$total == 5)/length(sample.space$total)
## [1] 0.1111111
The probability of getting 5 is 0.1111111.
sum(sample.space$total == 12)/length(sample.space$total)
## [1] 0.02777778
The probability of getting 12 is 0.02777778.
Problem 2. School absences
Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% miss 2 days, and 28% miss 3 or more days due to sickness.
p1day <- 25
p2day <- 15
p3daymore <- 28
(100 - (p1day + p2day + p3daymore))/100
## [1] 0.32
the probability that a student chosen at random doesnât miss any days of school due to sickness this year is 0.32.
1 - (p2day/100 + p3daymore/100)
## [1] 0.57
The probability that a student chosen at random misses no more than one day is 0.57
(p1day + p2day + p3daymore)/100
## [1] 0.68
The probability that a student chosen at random misses at least one day is 0.68
Considering two kids school missing are independent.
P(kid1 and kid2) = P(kid1)*P(kid2)
pkid1 <- (100 -(p1day+p2day+p3daymore))/100
pkid2 <- (100 -(p1day+p2day+p3daymore))/100
pkid1*pkid2
## [1] 0.1024
The probability that neither kid will miss any school is 0.1024.
pkid1 <- (p1day+p2day+p3daymore)/100
pkid2 <- (p1day+p2day+p3daymore)/100
pkid1*pkid2
## [1] 0.4624
The probability that both kids will miss some school is 0.4624.
I made assumption that missing school for both the kids are independent. I think it’s reasonable.
Problem 3. Health coverage, relative frequencies
The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey designed to identify risk factors in the adult population and report emerging health trends. The following table displays the distribution of health status of respondents to this survey (excellent, very good, good, fair, poor) and whether or not they have health insurance.
mat=matrix(c(.023, 0.0364, 0.0427, 0.0192, 0.0050,0.2099, 0.3123 ,0.2410 ,0.0817,0.0289), byrow=TRUE, nrow=2)
colnames(mat)=c("Excellent", "Very Good","Good", "Fair","Poor")
rownames(mat)=c("No Coverage","Coverage")
mat
## Excellent Very Good Good Fair Poor
## No Coverage 0.0230 0.0364 0.0427 0.0192 0.0050
## Coverage 0.2099 0.3123 0.2410 0.0817 0.0289
pexcellent <- sum(mat[,"Excellent"])
pcoverage <- sum(mat["Coverage",])
pexcellent*pcoverage
## [1] 0.203508
Since p(excellent health) and p(health coverage) are not mutually exclusive .
sum(mat[,"Excellent"])/sum(mat)
## [1] 0.2328767
The probability that a randomly chosen individual has excellent health is 0.2328767.
pexccov <- mat[2,1]
pexcgivcov <- pexccov/pcoverage
pexcgivcov
## [1] 0.2402152
The probability that a randomly chosen individual has excellent health given that he has health coverage is 0.2402152.
pexccov <- mat[1,1]
pncov <- sum(mat[1,])
pexcgivcov <- pexccov/pncov
pexcgivcov
## [1] 0.1821061
The probability that a randomly chosen individual has excellent health given that he doesnât have health coverage is 0.1821061.
pexcellent * pcoverage
## [1] 0.203508
mat[2,1]
## [1] 0.2099
Excellent health and having health coverage don’t appear to be independent as P(excellent and Coverage) not equal to P(excellent)*P(Coverage)
Problem 4. Exit Poll.
Edison Research gathered exit poll results from several sources for the Wisconsin recall election of Scott Walker. They found that 53% of the respondents voted in favor of Scott Walker. Additionally, they estimated that of those who did vote in favor for Scott Walker, 37% had a college degree, while 44% of those who voted against Scott Walker had a college degree. Suppose we randomly sampled a person who participated in the exit poll and found that he had a college degree. What is the probability that he voted in favor of Scott Walker?
A = Voted Scott Walker B = Had college degree
PA <- 0.53
PANot <- 1 - PA
PBgivenA <- 0.37
PBgivenNotA <- 0.44
# PAgivenB <- PAandB / PB
PAandB = PBgivenA * PA
# PB <- PAnotandB / PBgivenA
PAnotandB <- PBgivenNotA * PANot
PB <- PAnotandB / PBgivenA
PAgivenB <- PAandB / PB
PAgivenB
## [1] 0.3508559
The probability that he voted in favor of Scott Walker is 0.3508559.
Problem 5. Books on a bookshelf
The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
mymat2=matrix(c(13,59,15,8),nrow=2,byrow=TRUE)
colnames(mymat2)=c("hard","paper")
rownames(mymat2)=c("fiction","nonfiction")
mymat2
## hard paper
## fiction 13 59
## nonfiction 15 8
sum(mymat2[,"hard"])/sum(mymat2)*sum(mymat2["fiction","hard"])/(sum(mymat2)-1)
## [1] 0.04076148
The probability is 0.04076148.
sum(mymat2["fiction",])/sum(mymat2)*sum(mymat2[,"hard"])/(sum(mymat2)-1)
## [1] 0.2257559
The probability is 0.2257559.
sum(mymat2["fiction",])/sum(mymat2)*sum(mymat2[,"hard"])/(sum(mymat2))
## [1] 0.2233795
Andy is always looking for ways to make money fast. Lately, he has been trying to make money by gambling. Here is the game he is considering playing: The game costs 2 dollars to play. He draws a card from a deck. If he gets a number card (2-10), he wins nothing. For any face card (jack, queen or king), he wins 3 dollars. For any ace, he wins 5 dollars and he wins an extra $20 if he draws the ace of clubs.
total <- 52
number <- 36
xnumber <- -2 + 0
xnumberp <- number/52
face <- 12
facenumber <- -2 + 3
facenumberp <- face/52
ace <- 3
acenumber <- -2 + 5
acenumberp <- ace / 52
clubace <- 1
clubacenum <- -2 + 25
clubacenump <- clubace / 52
dt <- data.frame(profit = c(xnumber, facenumber, acenumber, clubacenum), prob_profit = c(xnumberp, facenumberp, acenumberp, clubacenump))
dt
## profit prob_profit
## 1 -2 0.69230769
## 2 1 0.23076923
## 3 3 0.05769231
## 4 23 0.01923077
sum(dt$profit*dt$prob_profit)
## [1] -0.5384615
Here the expected profit is -0.5384615. Which is loss. I won’t recommend the game.
Problem 7. Scooping ice cream.
Ice cream usually comes in 1.5 quart boxes (48 fluid ounces), and ice cream scoops hold about 2 ounces. However, there is some variability in the amount of ice cream in a box as well as the amount of ice cream scooped out. We represent the amount of ice cream in the box as X and the amount scooped out as Y . Suppose these random variables have the following means, standard deviations, and variances:
mymat3=matrix(c(48,1,1, 2,.25,.0625), nrow=2, byrow=TRUE)
colnames(mymat3)=c("mean", "SD", "Var")
rownames(mymat3)=c("X, In Box","Y, Scooped")
mymat3
## mean SD Var
## X, In Box 48 1.00 1.0000
## Y, Scooped 2 0.25 0.0625
mymat3[1,"mean"]+3*mymat3[2,"mean"]
## [1] 54
54 ounces of ice cream.
mymat3[1,"SD"]+mymat3[2,"SD"]
## [1] 1.25
standard deviation 1.25 ounces.
That is, find the expected value of X ??? Y . What is the standard deviation of the amount left in the box?
mymat3[1,"mean"]-mymat3[2,"mean"]
## [1] 46
Expected value is 46 ounce left.
mymat3[1,"SD"] + mymat3[2,"Var"]/2
## [1] 1.03125
New variance is 1.03125 (c) Using the context of this exercise, explain why we add variances when we subtract one random variable from another.
Removing one variable to another increases the variability. That’s why we add variances. Remove