SPS_Bridge_Course_HW2 Euclid Zhang

Problem 1. Dice Rolls

If you roll a pair of fair dice, what is the probability of.. (a) getting a sum of 1? (b) getting a sum of 5? (c) getting a sum of 12?

Create a function to calculate the probability of a given sum. The DiceSum represents all possible outcomes from rolling a pair of dices. The probability is calculated by using the number apparence of the given sum, devided by the total number of outcomes

Dice1 <- rep(1:6,6)
Dice2 <- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6),rep(5,6),rep(6,6))
DiceSum <- Dice1 + Dice2
dDiceSum <- function(x) length(DiceSum[DiceSum == x])/length(DiceSum)
  1. The probability of getting 1 is
dDiceSum(1)
## [1] 0
  1. The probability of getting 5 is
dDiceSum(5)
## [1] 0.1111111
  1. The probability of getting 12 is
dDiceSum(12)
## [1] 0.02777778

Problem 2. School absences

Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% miss 2 days, and 28% miss 3 or more days due to sickness.

p1 <- 0.25
p2 <- 0.15
p3 <- 0.28
  1. What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?
    P(0 sick days) = 1 - P(1 or 1+ sick days) = 1 - P(1 sick days) - P(2 sick days) - P(3 or 3+ sick days)
p0 <- 1 - p1 - p2 - p3
p0
## [1] 0.32
  1. What is the probability that a student chosen at random misses no more than one day?
    p(more than 1 sick days) = p(0 sick days) + p(1 sick days)
p0 + p1
## [1] 0.57
  1. What is the probability that a student chosen at random misses at least one day?
    P(at least 1 sick day) = 1 - P(0 sick days)
1-p0
## [1] 0.68
  1. If a parent has two kids at a DeKalb County elementary school, what is the probability that neither kid will miss any school? Note any assumption you must make to answer this question.
    Assume that the health of one kid doesn’t affect the health of the other
    P(Both 0 sick days) = P(0 sick days) * P(0 sick days)
p0 * p0 
## [1] 0.1024
  1. If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at least one day? Note any assumption you make.
    Assume that the health of one kid doesn’t affect the health of the other.
    P(Both kids 1 or 1+ sick days)
    = 1 - P(Both 0 sick days) - P(1 kid 0 sick days and 1 kid 1 or more sick days)
    = 1 - P(Both 0 sick days) - permutation(2,1)p(0 sick days)(1 - p(at least 1 sick day))
1 - p0*p0 - 2*p0*(1-p0)
## [1] 0.4624
  1. If you made an assumption in part (d) or (e), do you think it was reasonable? If you didn’t make any assumptions, double check your earlier answers.

It is not resonable, the health of the 2 kids are not independent because they have the same living condition and some illness can pass from 1 person to another

Problem 3. Health coverage, relative frequencies

The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey designed to identify risk factors in the adult population and report emerging health trends. The following table displays the distribution of health status of respondents to this survey (excellent, very good, good, fair, poor) and whether or not they have health insurance.

mat=matrix(c(.023, 0.0364, 0.0427, 0.0192, 0.0050,0.2099, 0.3123 ,0.2410 ,0.0817,0.0289), byrow=TRUE, nrow=2)
colnames(mat)=c("Excellent", "Very Good","Good", "Fair","Poor")
rownames(mat)=c("No Coverage","Coverage")
mat
##             Excellent Very Good   Good   Fair   Poor
## No Coverage    0.0230    0.0364 0.0427 0.0192 0.0050
## Coverage       0.2099    0.3123 0.2410 0.0817 0.0289
  1. Are being in excellent health and having health coverage mutually exclusive?
    Since being in excellent health and having health coverage can occur at the same time, they are not mutally exclusive.

  2. What is the probability that a randomly chosen individual has excellent health?
    p(excellent health) = p(excellent health with no coverage) + p(excellent health with coverage)

mat["No Coverage","Excellent"] + mat["Coverage","Excellent"]
## [1] 0.2329
  1. What is the probability that a randomly chosen individual has excellent health given that he has health coverage?
    p(excellent health | coverage) = p(excellent health with coverage) / p(coverage)
mat["Coverage","Excellent"]/sum(mat["Coverage",])
## [1] 0.2402152
  1. What is the probability that a randomly chosen individual has excellent health given that he doesn’t have health coverage?
    p(excellent health | no coverage) = p(excellent health with no coverage) / p(no coverage)
mat["No Coverage","Excellent"]/sum(mat["No Coverage",])
## [1] 0.1821061
  1. Do having excellent health and having health coverage appear to be independent?
    Since the probabiltiy of being in excellent health is significantly higher when a person has coverage, having excellent health and having health coverage are not independent.

Problem 4. Exit Poll.

Edison Research gathered exit poll results from several sources for the Wisconsin recall election of Scott Walker. They found that 53% of the respondents voted in favor of Scott Walker. Additionally, they estimated that of those who did vote in favor for Scott Walker, 37% had a college degree, while 44% of those who voted against Scott Walker had a college degree. Suppose we randomly sampled a person who participated in the exit poll and found that he had a college degree. What is the probability that he voted in favor of Scott Walker?

p(favor of Scott Walker | college degree)
= p(favor of Scott Walker with college degree)/p(college degree)
= p(favor of Scott Walker with college degree)/[p(favor of Scott Walker with college degree) + p(favor of Scott Walker without college degree]

(0.53*0.37)/((0.53*0.37)+(1-0.53)*0.44)
## [1] 0.4867213

Problem 5. Books on a bookshelf

The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.

mymat2=matrix(c(13,59,15,8),nrow=2,byrow=TRUE)
colnames(mymat2)=c("hard","paper")
rownames(mymat2)=c("fiction","nonfiction")
mymat2
##            hard paper
## fiction      13    59
## nonfiction   15     8
  1. Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.
    = p(hard) * p(paperback fiction with 1 less hardcover book in the bookshelf)
(sum(mymat2[,"hard"])/sum(mymat2))*(sum(mymat2["fiction","paper"])/(sum(mymat2)-1))
## [1] 0.1849944
  1. Determine the probability of drawing a fiction book first and then a hardcover book second,when drawing without replacement.
    p(fiction then hardcover)
    = p(paperback fiction then hardcover) + p(hardcover fiction then hardcover)
    = p(paperback fiction) * p(hardcover with 1 less paperback fiction in the bookshelf) + p(hardcover fiction) * p(hardcover with 1 less hardcover fiction in the bookshelf)
(sum(mymat2["fiction","paper"])/sum(mymat2))*(sum(mymat2[,"hard"])/(sum(mymat2)-1)) + (sum(mymat2["fiction","hard"])/sum(mymat2))*((sum(mymat2[,"hard"])-1)/(sum(mymat2)-1))
## [1] 0.2243001
  1. Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.
    p(fiction then hardcover with replacement) = p(fiction) * p(hardcover)
(sum(mymat2["fiction",])/sum(mymat2))*(sum(mymat2[,"hard"])/sum(mymat2))
## [1] 0.2233795
  1. The final answers to parts (b) and (c) are very similar. Explain why this is the case. Because taking out a small portion (1 in this case) from a comparative large population (95 in this case) will not have great impact on the probabilities.

Problem 6. Is it worth it?

Andy is always looking for ways to make money fast. Lately, he has been trying to make money by gambling. Here is the game he is considering playing: The game costs 2 dollars to play. He draws a card from a deck. If he gets a number card (2-10), he wins nothing. For any face card (jack, queen or king), he wins 3 dollars. For any ace, he wins 5 dollars and he wins an extra $20 if he draws the ace of clubs.

  1. Create a probability model and find Andy’s expected profit per game.
dPerGame <- c("2-10" = (4*9)/52, "Jack Queen King" = (4*3)/52, "ace of clubs" = 1/52, "other aces" = 3/52)
netWinPerGame <- c("2-10" = 0-2, "Jack Queen King" = 3-2, "ace of clubs" = 25-2, "other aces" = 20-2)

The probabilities of the outcomes are

dPerGame
##            2-10 Jack Queen King    ace of clubs      other aces 
##      0.69230769      0.23076923      0.01923077      0.05769231

The net profit of the outcomes are

netWinPerGame
##            2-10 Jack Queen King    ace of clubs      other aces 
##              -2               1              23              18

The expected profit per game is

expectedProfit <- sum(dPerGame*netWinPerGame)
expectedProfit
## [1] 0.3269231
  1. Would you recommend this game to Andy as a good way to make money? Explain.
    First, calculate the varriance of the game
sum(((netWinPerGame - expectedProfit)^2)*dPerGame)
## [1] 31.75851

The game has positive expected value and large varriance. I would recommand this game to Andy if he has sufficient fund at the beginning for a large number of games. Otherwise, I don’t recommand the game to Andy since the probability of lossing all his money is still high.

Problem 7. Scooping ice cream.

Ice cream usually comes in 1.5 quart boxes (48 fluid ounces), and ice cream scoops hold about 2 ounces. However, there is some variability in the amount of ice cream in a box as well as the amount of ice cream scooped out. We represent the amount of ice cream in the box as X and the amount scooped out as Y . Suppose these random variables have the following means, standard deviations, and variances:

mymat3=matrix(c(48,1,1, 2,.25,.0625), nrow=2, byrow=TRUE)
colnames(mymat3)=c("mean", "SD", "Var")
rownames(mymat3)=c("X, In Box","Y, Scooped")
mymat3
##            mean   SD    Var
## X, In Box    48 1.00 1.0000
## Y, Scooped    2 0.25 0.0625
  1. An entire box of ice cream, plus 3 scoops from a second box is served at a party. How much ice cream do you expect to have been served at this party? What is the standard deviation of the amount of ice cream served?
    E[X + Y + Y + Y] = E[X] + 3E[Y]
mymat3["X, In Box","mean"] + 3*mymat3["Y, Scooped","mean"]
## [1] 54

Var[X + Y + Y + Y] = Var[X] + Var[Y] + Var[Y] + Var[Y] = Var[X] + 3*Var[Y]
Std[X + Y + Y + Y] = sqrt(Var[X + Y + Y + Y])

sqrt(mymat3["X, In Box","Var"] + 3*mymat3["Y, Scooped","Var"])
## [1] 1.089725
  1. How much ice cream would you expect to be left in the box after scooping out one scoop of ice cream? That is, find the expected value of X ??? Y . What is the standard deviation of the amount left in the box?
    E[X - Y] = E[X] - E[Y]
mymat3["X, In Box","mean"] - mymat3["Y, Scooped","mean"]
## [1] 46

Var[X - Y] = Var[X] + Var[Y]
Std[X - Y] = sqrt(Var[X - Y])

sqrt(mymat3["X, In Box","Var"] + mymat3["Y, Scooped","Var"])
## [1] 1.030776
  1. Using the context of this exercise, explain why we add variances when we subtract one random variable from another.
    Bcause the value of X is uncertain and the value of Y is also uncertain, Y is taken away from X, the uncertainness of the remaining value is increased.