R Markdown
Problem 1
- getting a sum of 1? The probably of this is 0 because there is no way to get a sum of 1 with two dice. The minimum sum would be two
- getting a sum of 5? This is 4/36 = 0.1111
- getting a sum of 12? This is 1/36 = 0.02777
dice_combo1=matrix(c(1,1,1,1,1,1,1,2,3,4,5,6), ncol=2)
dice_combo2 <- dice_combo1
dice_combo2[,1] <- dice_combo1[,1] + 1
dice_combo3 <- dice_combo2
dice_combo3[,1] <- dice_combo2[,1] + 1
dice_combo4 <- dice_combo3
dice_combo4[,1] <- dice_combo3[,1] + 1
dice_combo5 <- dice_combo4
dice_combo5[,1] <- dice_combo4[,1] + 1
dice_combo6 <- dice_combo5
dice_combo6[,1] <- dice_combo5[,1] + 1
total_combinations=rbind(dice_combo1,dice_combo2,dice_combo3,dice_combo4,dice_combo5,dice_combo6)
The total combinations are 6*6 or 36
total_combinations
## [,1] [,2]
## [1,] 1 1
## [2,] 1 2
## [3,] 1 3
## [4,] 1 4
## [5,] 1 5
## [6,] 1 6
## [7,] 2 1
## [8,] 2 2
## [9,] 2 3
## [10,] 2 4
## [11,] 2 5
## [12,] 2 6
## [13,] 3 1
## [14,] 3 2
## [15,] 3 3
## [16,] 3 4
## [17,] 3 5
## [18,] 3 6
## [19,] 4 1
## [20,] 4 2
## [21,] 4 3
## [22,] 4 4
## [23,] 4 5
## [24,] 4 6
## [25,] 5 1
## [26,] 5 2
## [27,] 5 3
## [28,] 5 4
## [29,] 5 5
## [30,] 5 6
## [31,] 6 1
## [32,] 6 2
## [33,] 6 3
## [34,] 6 4
## [35,] 6 5
## [36,] 6 6
## let's get the total sums
## convert to a lsit to check for a value and count
dice_sums <- as.list(rowSums(total_combinations))
x <- function(dice_sum, sum)
{
total_sum_found = 0
for(counter in 1:36)
{
if(dice_sum[[counter]] == sum)
total_sum_found <- total_sum_found + 1
}
return(total_sum_found)
}
## Let's find the number of sums for 1
cat("Total sum values for 1 are: ", x(dice_sums, 1 ))
## Total sum values for 1 are: 0
## Let's find the number of sums for 5
cat("Total sum values for 5 are: ", x(dice_sums, 5 ))
## Total sum values for 5 are: 4
## Let's find the number of sums for 12
cat("Total sum values for 12 are: ", x(dice_sums, 12 ))
## Total sum values for 12 are: 1
Problem 2. School absences
Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% ###miss 2 days, and 28% miss 3 or more days due to sickness.
(a) What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?
Assuming a total of 100 students, the P(student not missing school) = 1 -P(1 day) - P(2 days) - P(3 or more)
1 -.25 - .28 - .28 = .32 or 100(.32) = 32 students
(b) What is the probability that a student chosen at random misses no more than one day?
P(zero days) + P(1 day) = .32 + .25 = .57 or 57%
(c) What is the probability that a student chosen at random misses at least one day?
Probabiliy that a student misses 0 days is .32, so the prob(at least 1 day or more) = 1 - .32 or .68 or 68%
(d) If a parent has two kids at a DeKalb County elementary school, what is the probability that neither kid will miss any school? Note any ###assumption you must make to answer this question.
the P that two kids do not miss school is P(child 1 not missing) * P(child 2 not missing). I am assuming that the two children not missing school are indendendent events. Therefore, the answer should be (.32)(.32) or .1024. I am also assuming that the 2nd student would not miss school given the first did not. This would reduce my total in the 2nd item by 1.
(e) If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at ###least one day? Note any assumption you make.
The probability that both students miss school, assuming that the P(missing school - at least one day) = .68, is P(Child 1 Missing) * P(Child 2 Missing) = .68*.68 = .4624. I am treating this as a
(f) If you made an assumption in part (d) or (e), do you think it was reasonable? If you didn’t make any assumptions, double check your earlier answers.
Question 3 Health coverage, relative frequencies
The Behavioral Risk Factor Surveillance System (BRFSS) is an annual telephone survey designed to identify risk factors in the adult ###population and report emerging health trends. The following table displays the distribution of health status of respondents to this survey ###(excellent, very good, good, fair, poor) and whether or not they have health insurance.
mat=matrix(c(.023, 0.0364, 0.0427, 0.0192, 0.0050,0.2099, 0.3123 ,0.2410 ,0.0817,0.0289), byrow=TRUE, nrow=2)
colnames(mat)=c("Excellent", "Very Good","Good", "Fair","Poor")
rownames(mat)=c("No Coverage","Coverage")
mat
## Excellent Very Good Good Fair Poor
## No Coverage 0.0230 0.0364 0.0427 0.0192 0.0050
## Coverage 0.2099 0.3123 0.2410 0.0817 0.0289
## Let's add total for each column
cTotal <- c(colSums(mat))
rTotal <-c(rowSums(mat))
mat <- rbind(mat, cTotal)
mat <- cbind(mat, rTotal)
## Warning in cbind(mat, rTotal): number of rows of result is not a multiple
## of vector length (arg 2)
## correct double summation
x<-colSums(mat[1:2,])
mat[3,6] <- x[6]
mat
## Excellent Very Good Good Fair Poor rTotal
## No Coverage 0.0230 0.0364 0.0427 0.0192 0.0050 0.1263
## Coverage 0.2099 0.3123 0.2410 0.0817 0.0289 0.8738
## cTotal 0.2329 0.3487 0.2837 0.1009 0.0339 1.0001
## Assume a population of 100
mat <- 100* mat
mat
## Excellent Very Good Good Fair Poor rTotal
## No Coverage 2.30 3.64 4.27 1.92 0.50 12.63
## Coverage 20.99 31.23 24.10 8.17 2.89 87.38
## cTotal 23.29 34.87 28.37 10.09 3.39 100.01
(a) Are being in excellent health and having health coverage mutually exclusive?
no because based on the table a total of 0.2329 are both in Excellent health and have health coverage
(b) What is the probability that a randomly chosen individual has excellent health?
The probability of choosing someone who has excellent health is 23.3%
(c) What is the probability that a randomly chosen individual has excellent health given that he has health coverage?
### the P(Excellent Health | Health Coverage) is 20.99/87.38 ~ 24%
0.2099/0.8738
## [1] 0.2402152
(d) What is the probability that a randomly chosen individual has excellent health given that he doesn’t have health coverage?
The total P(Excellent Health | No Coverage) = 2.30/12.63 or ~ 18.2%
0.023/0.1263
## [1] 0.1821061
(e) Do having excellent health and having health coverage appear to be independent?
No, because randomly selecting someone who is has excellent health can also have health coverage. If the total of those who have excellent health had no health coverage. In this set of data, there is an intersection of A and B.
Problem 4. Exit Poll.
Edison Research gathered exit poll results from several sources for the Wisconsin recall election of Scott Walker. They found that 53% of the ###respondents voted in favor of Scott Walker. Additionally, they estimated that of those who did vote in favor for Scott Walker, 37% had a ###college degree, while 44% of those who voted against Scott Walker had a college degree. Suppose we randomly sampled a person who participated ###in the exit poll and found that he had a college degree. What is the probability that he voted in favor of Scott Walker?
### Assuming a total of 100, the following table was constructed
exit_poll <-matrix(c(37,16,53,20.68,26.32,47,57.68,42.32,100), nrow=3, byrow=TRUE)
colnames(exit_poll) <-c("Degree", "No-Degree", "Total with Degrees")
rownames(exit_poll) <-c("Voted For", "Voted Against", "Total")
exit_poll
## Degree No-Degree Total with Degrees
## Voted For 37.00 16.00 53
## Voted Against 20.68 26.32 47
## Total 57.68 42.32 100
Question: Suppose we randomly sampled a person who participated in the exit poll and found that he had a college degree. What is the ###probability that he voted in favor of Scott Walker?
The total of those with a degree is 57.68, so the P(Voted for Scott Walker | Degree) = 37.00/57.68 or ~ 64.14%
37/57.68
## [1] 0.6414702
Problem 5. Books on a bookshelf
The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
mymat2=matrix(c(13,59,15,8),nrow=2,byrow=TRUE)
colnames(mymat2)=c("hard","paper")
rownames(mymat2)=c("fiction","nonfiction")
## Totals Column
mymat2<-cbind(mymat2,rowSums(mymat2))
colnames(mymat2)[3] = "Total"
mymat2 <-rbind(mymat2, colSums(mymat2))
rownames(mymat2)[3] = "Total"
mymat2
## hard paper Total
## fiction 13 59 72
## nonfiction 15 8 23
## Total 28 67 95
(a) Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.
P(Hardbook) = 28/95 or .2947% P(Paperback and fiction) = 59/94 ~ 0.6276596 P(Hard and then Paperback Fiction Book) = .2947*.6276596 or ###roughly 18.48%
(b) Determine the probability of drawing a fiction book first and then a hardcover book second,when drawing without replacement.
“P(fiction book) = 72/95 P(hardbook) = 28/94 P(Hardbook) * P(Fiction) = 0.225727
(c) Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book ###is placed back on the bookcase before randomly drawing the second book.
In this scenario, since the fiction book is placed back, the P(of a fiction book) = 72/95 (~75%) and the P(hardbook) is 28/95 or (~29.4%). ###Assuming we want the probability of this order than the P = (72/95)*(28/95) = ~ 22.3%
(d) The final answers to parts (b) and (c) are very similar. Explain why this is the case.
the total is reduced by one in part B, therefore the difference is negligible
Problem 6. Is it worth it?
Andy is always looking for ways to make money fast. Lately, he has been trying to make money by gambling. Here is the game he is considering ###playing: The game costs 2 dollars to play. He draws a card from a deck. If he gets a number card (2-10), he wins nothing. For any face card ###(jack, queen or king), he wins 3 dollars. For any ace, he wins 5 dollars and he wins an extra $20 if he draws the ace of clubs.
the P(Losing Money or selecting a card from 2-10) = 36/52 or 0.692
The P(winning 3 or selecting a face card) = 12/52 ~ 0.23
the P(Selecting an Ace or 5) = 4/52 = 0.0769
the probability that the Ace is a club is 1/52 = 0.01
odds_of_winning=matrix(round(c(36/52,12/52,3/52,1/52),2))
prize = c(-2,3,5,25)
### -2 is used for cards 2-10 because you have to pay 2 dollars to play
colnames(odds_of_winning) <- c("Chance of Winning")
rownames(odds_of_winning) <- c("2-10", "Face Card", "Ace - Not Clubs", "Ace of Clubs")
odds_of_winning <-t(odds_of_winning)
odds_of_winning <- rbind(odds_of_winning,prize)
odds_of_winning
## 2-10 Face Card Ace - Not Clubs Ace of Clubs
## Chance of Winning 0.69 0.23 0.06 0.02
## prize -2.00 3.00 5.00 25.00
### The expected return should be the probability * the expected price
### the expected value based on the probabilities should be as follows:
### -2(.69)+3(.23)+5(.06)+25(0.02) = .11 cents for a single game. Over the long run, Andy should win money
-2*(.69)+3*(.23)+5*(.06)+25*(0.02)
## [1] 0.11
barplot(odds_of_winning[1,],ylim = c(0,1.0))
### (b) Would you recommend this game to Andy as a good way to make money? Explain. ### Based on the expected return of -2(.69)+3(.23)+5(.06)+25(0.02) = .11, Andy should expect to win money if the cost is 2 dollars. If the cost of the game were to rise, say to 3 dollars, he would expect to lose money ###
Problem 7. Scooping ice cream.
Ice cream usually comes in 1.5 quart boxes (48 fluid ounces), and ice cream scoops hold about 2 ounces. However, there is some variability in ###the amount of ice cream in a box as well as the amount of ice cream scooped out. We represent the amount of ice cream in the box as X and the ###amount scooped out as Y . Suppose these random variables have the following means, standard deviations, and variances:
could not answer this one.