url<-getURL("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%202%20Exercise%20Data/assortive_mating.csv")

df<-data.frame(read.csv(text=url,header=TRUE))

url<-getURL("https://raw.githubusercontent.com/jbryer/DATA606Fall2017/master/Data/Data%20from%20openintro.org/Ch%202%20Exercise%20Data/books.csv")

dfbooks=data.frame(read.csv(text=url,header=TRUE))

Chapter 2 Probability

Problem 2.6

Dice Rolls.If you roll a pair of fair dice, what is the probability of

  1. getting a sum of 1? Sample S= {1,2,3,4,5,6}

    P(1 or 0 )=0 on first dice

    P(1 or 0)=0 on second dice

    p(sum of 1)=0

  2. getting a sum of 5?

in order to get a sum of five the first dice could either be 1,2,3,4

P(Sum of 5)= P(1,4) + P(2,3) + P(3,2) + P(4,1)

P(Sum of 5)=0.1111111

  1. getting a sum of 12?

      P(6)*P(6)=1/6 * 1/6
    
      P(6)* P(6)= 0.0277778

Problem 2.8 Poverty and Language

  1. Are living below the poverty line and speaking a foreign langauage at home disjoint?

Answer : No they are not disjoint because they can occur at the same time.

  1. Draw a venn diagram summarizing the variables and their associated probabilities.
grid.newpage()
draw.pairwise.venn(area1 = 14.6, area2 = 20.7, cross.area = 4.2, category = c("Persons below poverty", "Other language"))

## (polygon[GRID.polygon.11], polygon[GRID.polygon.12], polygon[GRID.polygon.13], polygon[GRID.polygon.14], text[GRID.text.15], text[GRID.text.16], text[GRID.text.17], text[GRID.text.18], text[GRID.text.19])
  1. What Percent of Americans live below the poverty line and only speak English at home?

Answer : A total of 10.4% Americans live below poverty line and speak English.

  1. What Percent of Americans live below the poverty line or speak a foreign language at home?

Answer: A total of 31.1% live below poverty or speak a foreign language.

  1. What Percent of Americans live above the poverty line and only speak english at home.

Answer: P(Live above poverty line )=0.86=P(A) P(Only Speak English at home)=0.793=P(B)

    P(A and B)=0.68198
    
  1. Is the event that someone lives below the poverty line independant of the event that the person speaks a foreign language at home?

Answer: Yes the event that someone lives below the poverty line is independant of whether someone speaks a foreign language at home.

Problem 2.20 Assortive Mating

head(df)
##   self_male partner_female
## 1      blue           blue
## 2      blue           blue
## 3      blue           blue
## 4      blue           blue
## 5      blue           blue
## 6      blue           blue
TotalCases=nrow(df)
TotalMalesBlueEyes=nrow(subset(df,self_male=="blue"))
TotalMalesBrownEyes=nrow(subset(df,self_male=="brown"))
TotalMalesGreenEyes=nrow(subset(df,self_male=="green"))
TotalMalesGreenEyesFemalesBlueEyes=nrow(subset(df,partner_female=="blue" & self_male=="green"))
TotalMalesBrownEyesFemalesBlueEyes=nrow(subset(df,partner_female=="blue" & self_male=="brown"))
TotalFemaleBlueEyes=nrow(subset(df,partner_female=="blue"))
TotalMaleFemaleBlueEyes=nrow(subset(df,partner_female=="blue" & self_male=="blue"))
TotalCases
## [1] 204
TotalMalesBlueEyes
## [1] 114
TotalFemaleBlueEyes
## [1] 108
TotalMaleFemaleBlueEyes
## [1] 78

(a)What is the probability that a randomly chosen male respondent or his partner has blue eyes.

Answer : P(Male has Blue Eyes or Female has blue eyes)=P(Male has blue) +P(Female has Blue)- P(Both have blue)

P(Male or Female have Blue Eyes) = 0.7058824

(b) What is the Probability that a randomly chosen male respondent with Blue Eyes has a partner with Blue Eyes?

This problem is conditional probablity so it means that the the number of cases is males with Blue Eyes that would be 114 within this subset the total number of females that have blue eyes are 78

So the probability will be calculated based on the following formula P(Female Blue Eyes and Male Blue Eyes)/P(Male Blue Eyes)= 0.6842105

(c)What is the probability that a randomly chosen male respondent with brown eyes has a partner with Blue Eyes? What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?

Answer : Total Males with brown eyes having a partner with blue eyes is = 0.3518519

Total Males with green Eyes having partners with Blue Eyes 0.3055556

(d) Does it appear that the eye colors of the respondents an their partners are independant?

Answer: Yes it does appear that the eye colors of the respondents and their partners is independant because regardless of what respondent is chosen at random the color of the partners eyes will not have to do with color of respondents eyes and therefore they are independant.

2.3 Books on a bookshelp

head(dfbooks)
##      type    format
## 1 fiction hardcover
## 2 fiction hardcover
## 3 fiction hardcover
## 4 fiction hardcover
## 5 fiction hardcover
## 6 fiction hardcover

(a) Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.

Answer these are independant events but without replacement means the second time the hardcover book will not be in the sample.

TotalSamples=nrow(dfbooks)
TotalHardCover=nrow(subset(dfbooks,format=="hardcover"))
TotalPaperback_fiction=nrow(subset(dfbooks,format=="hardcover" & type=="fiction"))
TotalFiction=nrow(subset(dfbooks,type=="fiction"))
TotalSamplesMinus1=TotalSamples-1
TotalSamples
## [1] 95
TotalHardCover
## [1] 28

The Probability of First Drawing a hardcover book then a paperback fiction is 0.0407615

(b) Determine the probability of drawing a fiction book first and then a hardcover book second, when drawing without replacement.

Answer :

Probability Distribution Table

table(dfbooks)/nrow(dfbooks)
##             format
## type          hardcover  paperback
##   fiction    0.13684211 0.62105263
##   nonfiction 0.15789474 0.08421053

Answer

FirstCase=  (TotalFiction/TotalSamples) * (TotalHardCover/TotalSamplesMinus1)
SecondCase=(TotalFiction/TotalSamples) * ((TotalHardCover-1)/TotalSamplesMinus1)

print('The Case When the first book drawn is not a hard cover')
## [1] "The Case When the first book drawn is not a hard cover"
FirstCase
## [1] 0.2257559
print('The Case When the first book drawn is a hard cover')
## [1] "The Case When the first book drawn is a hard cover"
SecondCase
## [1] 0.2176932

`

(c) do (b) With Replacement

Answer

The Probability would be 0.2233795

(d) The Final Answers to parts (b) and (c) are very similar . Explain why this is the case.

This is because we are sampling from small data sets and we are only doing two draws so the answers are not that different.

2.38 Baggage Fees

(a) Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.

baggageFees<-c(0,25,60)
passengers<-c(.54,.34,.12)
X<-baggageFees*passengers
v<-(baggageFees-X)^2 * passengers
df=data.frame(baggageFees,passengers,X)

sqrt(sum(v))
## [1] 20.66654

The Probability Model is

df
##   baggageFees passengers   X
## 1           0       0.54 0.0
## 2          25       0.34 8.5
## 3          60       0.12 7.2

The Average revenue per customer is 15.7

The Corresponding Standard Deviation is 20.6665382

(b) About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justified.

The total revenue would be Average revenue per passenger multiply by total passengers which is 1884

2.44 Income and Gender.

Income <- c(9999,10000,15000,25000,35000,50000,65000,75000,100000)
Total <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)

df<-data.frame(Income,Total)

df
##   Income Total
## 1   9999   2.2
## 2  10000   4.7
## 3  15000  15.8
## 4  25000  18.3
## 5  35000  21.2
## 6  50000  13.9
## 7  65000   5.8
## 8  75000   8.4
## 9 100000   9.7
IncomeLessthan50000=sum(subset(df,Income<50000)$Total)

(a)Describe the distribution of total personal income.

summary(df$Income)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9999   15000   35000   42780   65000  100000

(b)What is the probability that a randomly chosen US resident makes less than $50,000 per year?

It should be the Sum of all the probablities that are less than 50,000 which is 62.2

(c) What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions

It should be probability of person making less than 50 multiply by .5 as there is a 50 percent chance that the person would be female so it would be 31.1

(d) The Same data source indicates that 71.8% of females make less than $50,000 a year. Use this value to determine whether or not the assumption you made in part (c) is valid.

if 71.8% females make less than $50,000 a year then the probablity will be 44.6596 . This value is higher than the assumption I made.