The probability of getting a sum of 1 is 0
Total Possibilities of getting 5 is 4
Data:
1 + 4
2 + 3
3 + 2
4 + 1
Total Possibilities is 6 x 6 = 36
So the probability of getting 5 is
4/36
## [1] 0.1111111
#or
1/9
## [1] 0.1111111
Total Possibilities of getting 12 is 1
Data: 6 + 6
Total Possibilities is 6 x 6 = 36
So the probability of getting 12 is
1/36
## [1] 0.02777778
(a) Are living below the poverty line and speaking a foreign language at home disjoint?
No they are not disjoint. You can be under poverty line and speak foreign language
library(VennDiagram)
belowPovertyLine <- 14.6
speakForeignLanguage <- 20.7
bothCategories <- 4.2
venn.plot <- draw.pairwise.venn(belowPovertyLine, speakForeignLanguage, cross.area=bothCategories, c("Below Poverty", "Speak Only Foreign Language"), fill=c("green", "yellow"), cat.dist=-0.08, ind=FALSE)
grid.draw(venn.plot)
povertyOnly <- belowPovertyLine - bothCategories
povertyOnly
## [1] 10.4
speakForeignLanguage + belowPovertyLine - bothCategories
## [1] 31.1
povertyOnly <- belowPovertyLine - bothCategories
100 - speakForeignLanguage - povertyOnly
## [1] 68.9
(belowPovertyLine/100) * (speakForeignLanguage / 100)
## [1] 0.030222
P(A or B) = P(A)+P(B)-P(A and B)
((114+108)/204) - (78/204)
## [1] 0.7058824
P(A|B) = p(A and B)/P(B)
78/114
## [1] 0.6842105
#probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes
19/54
## [1] 0.3518519
#probability of a randomly chosen male respondent with green eyes having a partner with blue eyes
11/36
## [1] 0.3055556
Eye colors of male respondents and their partners does not appear independent. Data for the same color seem to be larger in all categories.
Format Hardcover Paperback Total Type Fiction 13 59 72 Nonfiction 15 8 23 Total 28 67 95
(28/95) * (59/94)
## [1] 0.1849944
(72/95) * (28/94)
## [1] 0.2257559
((72/95)*(28/95))
## [1] 0.2233795
This is because of the total data is large i.e 95. Taking one book is less significant. Take the example below with a smaller size. The difference is larger.
((2/5)*(3/4))
## [1] 0.3
((2/5)*(3/5))
## [1] 0.24
#Probability of checked luggage - 0 bags 54%, 1 bag - 34%, 2 bags - 12%
probabilityCheckedLuggage <- c(0.54, 0.34, 0.12)
#Number of bags
bagQuantity <- c(0, 1, 2)
#Baggage Fee
baggageFee <- c(0, 25, 25 + 35)
dfCheckinBags <- data.frame(probabilityCheckedLuggage, bagQuantity, baggageFee)
dfCheckinBags$probabilityFee <- dfCheckinBags$probabilityCheckedLuggage * dfCheckinBags$baggageFee
dfCheckinBags
## probabilityCheckedLuggage bagQuantity baggageFee probabilityFee
## 1 0.54 0 0 0.0
## 2 0.34 1 25 8.5
## 3 0.12 2 60 7.2
#Revenue per passenger
revenuPassenger <- sum(dfCheckinBags$probabilityFee)
revenuPassenger
## [1] 15.7
#Variance
dfCheckinBags$MeanDf <- dfCheckinBags$probabilityFee - revenuPassenger
dfCheckinBags$Sqr <- dfCheckinBags$MeanDf ^ 2
dfCheckinBags$sp <- dfCheckinBags$Sqr * dfCheckinBags$probabilityCheckedLuggage
dfCheckinBags
## probabilityCheckedLuggage bagQuantity baggageFee probabilityFee MeanDf
## 1 0.54 0 0 0.0 -15.7
## 2 0.34 1 25 8.5 -7.2
## 3 0.12 2 60 7.2 -8.5
## Sqr sp
## 1 246.49 133.1046
## 2 51.84 17.6256
## 3 72.25 8.6700
#SD
var <- sum(dfCheckinBags$sp)
stdDev <- sqrt(var)
stdDev
## [1] 12.62538
#120 passengers
noOfPassengers <- 120
avgerageInc <- revenuPassenger * noOfPassengers
avgerageInc
## [1] 1884
#SD
var120 <- (noOfPassengers ^ 2) * var
sd120 <- sqrt(var120)
sd120
## [1] 1515.046
income <- c("$1 - $9,999 or loss","$10,000 to $14,999", "$15,000 to $24,999", "$25,000 to $34,999", "$35,000 to $49,999", "$50,000 to $64,000", "$65,000 to $74,999", "$75,000 to $99,999", "$100,000 or more")
total <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
#incomelowerrange <- c(1,10000, 15000, 25000, 35000, 50000, 65000, 75000, 100000)
#incomeupperrange <- c(9999,14999, 24999, 34999, 49999, 64000, 74999, 99999, 199999)
#incomeuppermiddle <- (incomelowerrange + incomeupperrange) /2
#incometotal <- incomeuppermiddle * total
df_incomegender <- data.frame(income, total)
df_incomegender
## income total
## 1 $1 - $9,999 or loss 2.2
## 2 $10,000 to $14,999 4.7
## 3 $15,000 to $24,999 15.8
## 4 $25,000 to $34,999 18.3
## 5 $35,000 to $49,999 21.2
## 6 $50,000 to $64,000 13.9
## 7 $65,000 to $74,999 5.8
## 8 $75,000 to $99,999 8.4
## 9 $100,000 or more 9.7
hist(df_incomegender$total)
barplot(df_incomegender$total, names.arg=income)
#First 5 rows / total
probab50000 <- sum(df_incomegender[1:5,]$total) / sum(df_incomegender$total)
probab50000
## [1] 0.622
probab50000female <- 0.41 * probab50000
probab50000female
## [1] 0.25502
female50000 <- 0.718 * .41
female50000
## [1] 0.29438
There is a small variation from (c). The assumption is not valid.