Question 2.6
Dice rolls. If you roll a pair of fair dice, what is the probability of
getting a sum of 1? 0 Probability of getting a 1
getting a sum of 5? Outcome 1 = P(Y=1) * P(Y=4) = 1/6 * 1/6 = 1/36
Outcome 2 = P(Y=2) * P(Y=3) = 1/6 * 1/6 = 1/36
Outcome 3 = P(Y=3) * P(Y=2) = 1/6 * 1/6 = 1/36
Outcome 4 = P(Y=4) * P(Y=1) = 1/6 * 1/6 = 1/36
Probability of getting a sum of 5 = 4/36
Probability of gettng a sum of 12 = 1/36
Question 2.8
Define Variables:
P(PL) = 14.6% P(FL) = 20.7% P(PL and FL) = 4.2%
PL <- 14.6/100
FL<- 20.7/100
PLandFL <- 4.2/100
Are living below the poverty line and speaking a foreign language at home disjoint? Living below the povery line and speaking a foreign language are not disjoint as both can occur together (P(A and L)).
Draw a Venn diagram summarizing the variables and their associated probabilities.
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
grid.newpage()
draw.pairwise.venn(area1 = 14.6, area2 = 20.7, cross.area = 4.2, category = c("Below PL",
"ForeignLang"))
## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9])
PLEng <- paste(round((PL - PLandFL)*100, digits = 1), "%", sep="")
PLEng
## [1] "10.4%"
10.4 % Americans live below the poverty line and speak English at home.
#General Addition Rule
PLorFL <- paste(round((PL + FL - PLandFL)*100, digits = 1), "%", sep="")
PLorFL
## [1] "31.1%"
31.1% of Americans live below the poverty line and speak english at home
Find complement of P(PL) and P(FL) and use multiplication rule for independent processes
P(PL or PLc) = P(PL) + P(PLc) = 1 1 - P(PL) = P(PLc)
P(FL or FLc) = P(FL) + P(FLc) = 1 1 - P(FL) = P(FLc)
#compliment of PL and FL
PLc <- 1 - PL
PLc
## [1] 0.854
FLc <- 1 - FL
FLc
## [1] 0.793
#multiplication rule for independent processes
PLcFLc <- paste(round((PLc * FLc)*100, digits=1), "%", sep="")
PLcFLc
## [1] "67.7%"
67.7% of Americans live above the poverty line and only speak english at home
The event that someone lives below the poverty line is indepent of the event that someone speaks a foreign language at home because knowing the outcome of one provides not information about outcome of the other.
General Addition Rule:
P(MBlue) + P(FBLue) - P(MBlue AND FBlue)
MBlue <- 114/204
FBlue <- 108/204
FMBlue <- 78/204
ForMblue <- paste(round((MBlue + FBlue - FMBlue)*100, digits=1), "%", sep = "")
ForMblue
## [1] "70.6%"
70.6% probability that a male or female has blue eyes
Probability that a male respondent has a partner with blue eyes is 78/204
FandMblue <- paste(round((FMBlue)*100, digits = 1), "%", sep = "")
FandMblue
## [1] "38.2%"
19/204
What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?
11/204
Question 2.30 (a)Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.
A <- paste(round(((28/95) * (67/94))*100, digits = 2), "%", sep = "")
A
## [1] "21.01%"
B <- paste(round(((72/95) * (28/94))*100, digits = 2), "%", sep = "")
B
## [1] "22.58%"
C <- paste(round(((72/95) * (28/95))*100, digits = 2), "%", sep = "")
C
## [1] "22.34%"
The results are similar because when the sample size is only a small fraction of the population (under 10%), observations are nearly independent even when sampling without replacement.
2.38
Baggage fees. An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
bags <- c(0,1,2)
cost <- c(0,25,35+25)
percent <- c(.54, .34, .12)
baggagefees <- data.frame(bags, cost, percent)
baggagefees
## bags cost percent
## 1 0 0 0.54
## 2 1 25 0.34
## 3 2 60 0.12
#compute expected value
Avgrev <- sum(cost * percent)
Avgrev
## [1] 15.7
The average revenue per customer is $15.70.
#compute variance
variability <- ((cost - Avgrev)^2) * percent
totalvar <- sum(variability)
totalvar
## [1] 398.01
#compute standard deviation (square root of variance)
sd <- round(sqrt(totalvar),2)
sd
## [1] 19.95
The standard deviation is 19.95
expectedrev <- Avgrev * 120
expectedrev
## [1] 1884
The expected revenue for 120 passengers is $1884.
var120 <- (120 * sd^2)
var120
## [1] 47760.3
sd120 <- round(sqrt(var120),2)
sd120
## [1] 218.54
The standard deviation for the expected revenue of 120 passengers is 218.54
Income and gender. The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females
income <- c("$1 to $9,999","$10,000 to $14,999","$15,000 to $24,999","$25,000 to $34,999","$35,000 to $49,999","$50,000 to $64,999","$65,000 to $74,999","$75,000 to $99,999","$100,000 or more")
total <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
d <- data.frame(income, total)
d
## income total
## 1 $1 to $9,999 2.2
## 2 $10,000 to $14,999 4.7
## 3 $15,000 to $24,999 15.8
## 4 $25,000 to $34,999 18.3
## 5 $35,000 to $49,999 21.2
## 6 $50,000 to $64,999 13.9
## 7 $65,000 to $74,999 5.8
## 8 $75,000 to $99,999 8.4
## 9 $100,000 or more 9.7
barplot(d$total, names.arg = income, xlab = "Personal Income", ylab = "Total")
The distribution is unimodal
Pless50 <- paste(round(((2.2 + 4.7 + 15.8 + 18.3 + 21.2)/ sum(total))*100, digits = 2), "%", sep ="")
Pless50
## [1] "62.2%"
**The probability of less that 50,000 is 62.2%.
(c)What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions you make.
femless50 <- ((2.2 + 4.7 + 15.8 + 18.3 + 21.2)/ sum(total)) * 41/100
femless50
## [1] 0.25502
The probability that a US resident makes less than $50,000 per year and is female is 26%.I do not want to have to assume this, but men tend to habe higher salaries than women, so the actually percentage of women below $50,000 maye actually be much higher
(d)The same data source indicates that 71.8% of females make less than $50,000 per year. Use this value to determine whether or not the assumption you made in part (c) is valid. If 71.8% of females actually make less than $50,000 per year than that would mean men tend to make more money than money do and women’s salararies are towards the lower end of the personal income distribution.