url<-getURL("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%202%20Exercise%20Data/assortive_mating.csv")
df<-data.frame(read.csv(text=url,header=TRUE))
url<-getURL("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%202%20Exercise%20Data/books.csv")
dfbooks=data.frame(read.csv(text=url,header=TRUE))
Dice Rolls.If you roll a pair of fair dice, what is the probability of
getting a sum of 1? Sample S= {1,2,3,4,5,6}
P(1 or 0 )=0 on first dice
P(1 or 0)=0 on second dice
p(sum of 1)=0
getting a sum of 5?
in order to get a sum of five the first dice could either be 1,2,3,4
P(Sum of 5)= P(1,4) + P(2,3) + P(3,2) + P(4,1)
P(Sum of 5)=0.1111111
getting a sum of 12?
P(6)*P(6)=1/6 * 1/6
P(6)* P(6)= 0.0277778Answer : No they are not disjoint because they can occur at the same time.
grid.newpage()
draw.pairwise.venn(area1 = 14.6, area2 = 20.7, cross.area = 4.2, category = c("Persons below poverty", "Other language"))
## (polygon[GRID.polygon.11], polygon[GRID.polygon.12], polygon[GRID.polygon.13], polygon[GRID.polygon.14], text[GRID.text.15], text[GRID.text.16], text[GRID.text.17], text[GRID.text.18], text[GRID.text.19])
Answer : A total of 10.4% Americans live below poverty line and speak English.
Answer: A total of 31.1% live below poverty or speak a foreign language.
Answer: P(Live above poverty line )=0.86=P(A) P(Only Speak English at home)=0.793=P(B)
P(A and B)=0.68198
Answer: Yes the event that someone lives below the poverty line is independant of whether someone speaks a foreign language at home.
head(df)
## self_male partner_female
## 1 blue blue
## 2 blue blue
## 3 blue blue
## 4 blue blue
## 5 blue blue
## 6 blue blue
TotalCases=nrow(df)
TotalMalesBlueEyes=nrow(subset(df,self_male=="blue"))
TotalMalesBrownEyes=nrow(subset(df,self_male=="brown"))
TotalMalesGreenEyes=nrow(subset(df,self_male=="green"))
TotalMalesGreenEyesFemalesBlueEyes=nrow(subset(df,partner_female=="blue" & self_male=="green"))
TotalMalesBrownEyesFemalesBlueEyes=nrow(subset(df,partner_female=="blue" & self_male=="brown"))
TotalFemaleBlueEyes=nrow(subset(df,partner_female=="blue"))
TotalMaleFemaleBlueEyes=nrow(subset(df,partner_female=="blue" & self_male=="blue"))
TotalCases
## [1] 204
TotalMalesBlueEyes
## [1] 114
TotalFemaleBlueEyes
## [1] 108
TotalMaleFemaleBlueEyes
## [1] 78
Answer : P(Male has Blue Eyes or Female has blue eyes)=P(Male has blue) +P(Female has Blue)- P(Both have blue)
P(Male or Female have Blue Eyes) = 0.7058824
This problem is conditional probablity so it means that the the number of cases is males with Blue Eyes that would be 114 within this subset the total number of females that have blue eyes are 78
So the probability will be calculated based on the following formula P(Female Blue Eyes and Male Blue Eyes)/P(Male Blue Eyes)= 0.6842105
Answer : Total Males with brown eyes having a partner with blue eyes is = 0.3518519
Total Males with green Eyes having partners with Blue Eyes 0.3055556
Answer: Yes it does appear that the eye colors of the respondents and their partners is independant because regardless of what respondent is chosen at random the color of the partners eyes will not have to do with color of respondents eyes and therefore they are independant.
head(dfbooks)
## type format
## 1 fiction hardcover
## 2 fiction hardcover
## 3 fiction hardcover
## 4 fiction hardcover
## 5 fiction hardcover
## 6 fiction hardcover
Answer these are independant events but without replacement means the second time the hardcover book will not be in the sample.
TotalSamples=nrow(dfbooks)
TotalHardCover=nrow(subset(dfbooks,format=="hardcover"))
TotalPaperback_fiction=nrow(subset(dfbooks,format=="hardcover" & type=="fiction"))
TotalFiction=nrow(subset(dfbooks,type=="fiction"))
TotalSamplesMinus1=TotalSamples-1
TotalSamples
## [1] 95
TotalHardCover
## [1] 28
The Probability of First Drawing a hardcover book then a paperback fiction is 0.0407615
Answer :
table(dfbooks)/nrow(dfbooks)
## format
## type hardcover paperback
## fiction 0.13684211 0.62105263
## nonfiction 0.15789474 0.08421053
FirstCase= (TotalFiction/TotalSamples) * (TotalHardCover/TotalSamplesMinus1)
SecondCase=(TotalFiction/TotalSamples) * ((TotalHardCover-1)/TotalSamplesMinus1)
print('The Case When the first book drawn is not a hard cover')
## [1] "The Case When the first book drawn is not a hard cover"
FirstCase
## [1] 0.2257559
print('The Case When the first book drawn is a hard cover')
## [1] "The Case When the first book drawn is a hard cover"
SecondCase
## [1] 0.2176932
`
The Probability would be 0.2233795
This is because we are sampling from small data sets and we are only doing two draws so the answers are not that different.
baggageFees<-c(0,25,60)
passengers<-c(.54,.34,.12)
X<-baggageFees*passengers
v<-(baggageFees-X)^2 * passengers
df=data.frame(baggageFees,passengers,X)
sqrt(sum(v))
## [1] 20.66654
df
## baggageFees passengers X
## 1 0 0.54 0.0
## 2 25 0.34 8.5
## 3 60 0.12 7.2
The Average revenue per customer is 15.7
The Corresponding Standard Deviation is 20.6665382
The total revenue would be Average revenue per passenger multiply by total passengers which is 1884
Income <- c(9999,10000,15000,25000,35000,50000,65000,75000,100000)
Total <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
df<-data.frame(Income,Total)
df
## Income Total
## 1 9999 2.2
## 2 10000 4.7
## 3 15000 15.8
## 4 25000 18.3
## 5 35000 21.2
## 6 50000 13.9
## 7 65000 5.8
## 8 75000 8.4
## 9 100000 9.7
IncomeLessthan50000=sum(subset(df,Income<50000)$Total)
summary(df$Income)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9999 15000 35000 42780 65000 100000
It should be the Sum of all the probablities that are less than 50,000 which is 62.2
It should be probability of person making less than 50 multiply by .5 as there is a 50 percent chance that the person would be female so it would be 31.1
if 71.8% females make less than $50,000 a year then the probablity will be 44.6596 . This value is higher than the assumption I made.