P(A) = 0
#### (b) Getting a sum of 5 P(B) = 4/36 || 0.11111…
#### (c) Getting a sum of 12 P(C) = 1/36 || 0.028
These events are not disjoint. It is possible for the subject to speak both a foreign language and live below the poverty line.
#### (b) Draw a Venn diagram summarizing the variables and their associated probabilities
library(VennDiagram,grid)
## Loading required package: grid
## Loading required package: futile.logger
belowPoverty = 14.6
foreignLanguage = 20.7
joint = 4.2
grid.newpage()
vD = draw.pairwise.venn(belowPoverty,foreignLanguage,cross.area=joint,category = c(" Below PL","Speak FL"), fill = c('yellow','blue'))
The venn diagram shows us that this % is 10.4 ####(d) What percent of Americans live below the poverty line or speak a foreign language at home? The venn diagram shows us that this % is 26.9
####(e) What percent of Americans live above the poverty line and only speak English at home? We take the total of those below the PL, subtract that from 100. We take the remainder and subtract the total of those who speak FL, the remainder is our answer.
onOrAbovePovertyLine = 100-belowPoverty
answer = onOrAbovePovertyLine - foreignLanguage
answer
## [1] 64.7
Realistically, we can say the events are probably dependent. Statistically speaking, in the scope of what we’ve learned so far, they are definitely dependent. This is because it fails the test of multiplication rule, P(A&B) should equal the P(A)*P(B)…Whic it does not. Another way to look at it, would be does the venn diagram cross? If yes, then dependent.
femBlue = c(78,19,11)
femBrown = c(23,23,9)
femGreen = c(13,12,16)
df = data.frame(femBlue,femBrown,femGreen)
row.names(df) = c("maleBlue","maleBrown","maleGreen")
cbind(df, Total = rowSums(df))
## femBlue femBrown femGreen Total
## maleBlue 78 23 13 114
## maleBrown 19 23 12 54
## maleGreen 11 9 16 36
P(A)+P(B)-P(A&B) = 114/204 + 108/204 - 78/204 > 114/204 + 108/204 - 78/204 [1] 0.7058824
A 70% probability that a randomly chosen male or his partner has blue eyes
P(A|B) > 78/114 [1] 0.6842105 A 68% probability that a randomly chosen male with blue eyes has a partenr with blue eyes
P(A|B) > 19/54 [1] 0.3518519
35% probability that a random chosen male with brown eyes has a partner with blue eyes > 11/36 [1] 0.3055556 30% probability that a random chosen male with green eyes has a partner with blue eyes
It does not appear that the eye color pairings are independent, the probabilities of a blue eyed male has a blue eyed female partner is disproportionately high.
hardcover = c(13,59)
paperback = c(59,8)
df2 = data.frame(hardcover,paperback)
row.names(df2)= c('Fiction','nonFiction')
cbind(df2, Total = rowSums(df2))
## hardcover paperback Total
## Fiction 13 59 72
## nonFiction 59 8 67
probDraw1 = 28/95
probDraw2 = 59/94
prob2Given1 = probDraw1*probDraw2
prob2Given1
## [1] 0.1849944
probDraw1 = 72/95
probDraw2 = 28/94
prob2Given1 = probDraw1*probDraw2
prob2Given1
## [1] 0.2257559
probDraw1 = 72/95
probDraw2 = 28/95
prob2Given1 = probDraw1*probDraw2
prob2Given1
## [1] 0.2233795
This is because sampling with or without replacement creates bigger differences in probability depending on the amount of events in question and items replaced, in this case the difference is 1/95 because one out of 95 books is replaced/not-replaced.
amtBags = c(0,1,2)
costOfBaggage = c(0,25,35)
probabilityOfBagge = c(.54,.34,.12)
df3 = data.frame(amtBags,costOfBaggage,probabilityOfBagge)
row.names(df3) = c('Amount of bags','cost of baggage','probability of event')
df3$eV = df3$costOfBaggage*df3$probabilityOfBagge
eVpp = sum(df3$eV)
variancepp = var(df3$eV)
sdpp = sd(df3$eV)
print( 'Average revenue per passenger')
## [1] "Average revenue per passenger"
eVpp##Average revenue per passenger
## [1] 12.7
print('Variance of revenue per passenger')
## [1] "Variance of revenue per passenger"
variancepp ##Variance of revenue per passenger
## [1] 18.06333
print('Standard deviation of revenue per passenger')
## [1] "Standard deviation of revenue per passenger"
sdpp ##Standard deviation of revenue per passenger
## [1] 4.250098
print('You can expect the avg revenue per flight to be the eV per passenger, multiplied by number of passengers')
## [1] "You can expect the avg revenue per flight to be the eV per passenger, multiplied by number of passengers"
eVpp*120
## [1] 1524
print('You can expect the deviation to be the SD per passenger, multiplied by number of passengers')
## [1] "You can expect the deviation to be the SD per passenger, multiplied by number of passengers"
sdpp*120
## [1] 510.0118
income = c("$1 - $9,999 or less",
"$10,000 to $14,999",
"$15,000 to $24,999",
"$25,000 to $34,999",
"$35,000 to $49,999",
"$50,000 to $64,000",
"$65,000 to $74,999",
"$75,000 to $99,999",
"$100,000 or more")
briefIncome = c(10000,15000,25000,35000,45000,55000,65000,75000,99999,00)
briefPercent = c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
barplot(briefPercent,briefIncome,xlab='Income distribution')
A bimodal distribution with right skew. We can see what looks like a vanishing middle class trend.
result = sum(briefPercent[1:5])
result
## [1] 62.2
We are assuming income and sex are independent factors, we don’t know the gender composition of individual brackets and will assume it is 59 male to 41 female with 0 SD. Which means we can use P(A&B) = P(A)*P(B) The probability that a randomly chosen US resident makes less than 50k a year and is female is approx. 25%
result*.41
## [1] 25.502