Graded: 2.6, 2.8, 2.20, 2.30, 2.38, 2.44
for dice 1: n_dice_1 = number showing on dice 1; p_dice_1 = posibility of showing certain number on dice 1
n_dice_1 <- c(1,2,3,4,5,6)
p_dice_1 <- c(1/6,1/6,1/6,1/6,1/6,1/6)
For dice 2: n_dice_2 = number showing on dice 2; p_dice_2 = posibility of showing certain number on dice 2
n_dice_2 <- c(1,2,3,4,5,6)
P_dice_2 <- c(1/6,1/6,1/6,1/6,1/6,1/6)
sum_dices: sumb of dice 1 and dice 2; p_sum: posiblicites for each sum
sum_dices <- c(2,3,4,5,6,7,8,9,10,11,12)
p_sum<- c(1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, 1/36)
Let’s build the table
dicef <- data.frame(sum_dices, p_sum)
names(dicef) <- c("Sums", "Probability")
dicef
## Sums Probability
## 1 2 0.02777778
## 2 3 0.05555556
## 3 4 0.08333333
## 4 5 0.11111111
## 5 6 0.13888889
## 6 7 0.16666667
## 7 8 0.13888889
## 8 9 0.11111111
## 9 10 0.08333333
## 10 11 0.05555556
## 11 12 0.02777778
4/36
## [1] 0.1111111
(1/6)*(1/6)
## [1] 0.02777778
Are living below the poverty line and speaking a foreign language at home disjoint? #Answer:No.because 4.2% fall into both categories.
Draw a Venn diagram summarizing the variables and their associated probabilities.
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
venn.plot <- draw.pairwise.venn(14.6, 20.7, 4.2, c("% Living below poverty line", "% Speaking foreign language"),cat.pos = 180, fill = c("orange","blue"));
grid.draw(venn.plot);
grid.newpage();
4.2
## [1] 4.2
(14.6+20.7)-4.2
## [1] 31.1
(100-14.6)-(20.7-4.2)
## [1] 68.9
Let’s build an independence condition: P(A and F) = P(A) * P(F) 0.042 = 0.146 * 0.207 Since 0.042???0.0300. P(A and F) ???P(A) * P(F) Because independency multiplication rule is not satisfied, We conclude that these events a not independent.
Extract Data
books <- read.csv("https://raw.githubusercontent.com/jbryer/DATA606Fall2016/master/Data/Data%20from%20openintro.org/Ch%202%20Exercise%20Data/books.csv")
table(books)
## format
## type hardcover paperback
## fiction 13 59
## nonfiction 15 8
(sum(books$format == "hardcover")/nrow(books)) * (sum(books$format == "paperback" & books$type == "fiction")/(nrow(books)-1) )
## [1] 0.1849944
(sum(books$type == "fiction")/nrow(books)) * (sum(books$format == "hardcover")/(nrow(books)-1) )
## [1] 0.2257559
(sum(books$type == "fiction")/nrow(books)) * (sum(books$format == "hardcover")/nrow(books) )
## [1] 0.2233795
BagFee <- c(0, 25, 60)
Prob <- c(.54, .34, .12)
Expected.Value <- (BagFee[1] * Prob[1]) + (BagFee[2] * Prob[2]) + (BagFee[3] * Prob[3])
Variance <- 0
i <- 1
while (i <= length(BagFee)){
temp <- ((BagFee[i] - Expected.Value)^2 * Prob[i])
Variance <- Variance + temp
i <- i + 1
}
round(Expected.Value,2)
## [1] 15.7
round(Variance, 2)
## [1] 398.01
SD <- sqrt(Variance)
round(SD,2)
## [1] 19.95
EX120 <- 120 * Expected.Value
Var120 <- 120 * SD^2
SD120 <- sqrt(Var120)
round(EX120,2)
## [1] 1884
income <- c("$1 to $9,999","$10,000 to $14,999","$15,000 to $24,999","$25,000 to $34,999","$35,000 to $49,999","$50,000 to $64,999","$65,000 to $74,999","$75,000 to $99,999","$100,000 or more")
total <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
dist <- data.frame(income, total)
dist
## income total
## 1 $1 to $9,999 2.2
## 2 $10,000 to $14,999 4.7
## 3 $15,000 to $24,999 15.8
## 4 $25,000 to $34,999 18.3
## 5 $35,000 to $49,999 21.2
## 6 $50,000 to $64,999 13.9
## 7 $65,000 to $74,999 5.8
## 8 $75,000 to $99,999 8.4
## 9 $100,000 or more 9.7
barplot(dist$total, main="Income Distribution", xlab="% of Population")
answer: aunimodal with slight right skew.
(2.2 + 4.7 + 15.8 + 18.3 + 21.2) / 100
## [1] 0.622
(2.2 + 4.7 + 15.8 + 18.3 + 21.2) / 100*0.41
## [1] 0.25502
(2.2 + 4.7 + 15.8 + 18.3 + 21.2) / 100*0.718
## [1] 0.446596
Part C states that .255 of the population are female and makes less than 50k. Here, in part D, this states that 71.8% of females made less than 50k. There is a huge discrepancy in the numbers, and thus it is not valid in part c.