Chapter 2 - Probability
Practice: 2.5, 2.7, 2.19, 2.29, 2.43 Graded: 2.6, 2.8, 2.20, 2.30, 2.38, 2.44
2.6 Dice rolls. If you roll a pair of fair dice, what is the probability of (a) getting a sum of 1? (b) getting a sum of 5? (c) getting a sum of 12?
a. 0. Lowest possible sum is 1+1 = 2
b. Possible combinations that add up to 5: {4,1},{1,4},{2,3},{3,2},
# different outcomes from 2 fair dice rolls
6 * 6
## [1] 36
# convert to probability
4/(6*6)
## [1] 0.1111111
c. Possible combinations that add up to 12: {6,6}.
1/(6*6)
## [1] 0.02777778
2.8 Poverty and language. The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.
a. Are living below the poverty line and speaking a foreign language at home disjoint?
Nope - people can fall into both categories.
b. Draw a Venn diagram summarizing the variables and their associated probabilities.
What percent of Americans live below the poverty line and only speak English at home?
Trying in R:
#Venn Diagram help: https://cran.r-project.org/web/packages/VennDiagram/VennDiagram.pdf
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
venn.plot <- draw.pairwise.venn(.146,.207,.042, c("Poverty","Foreign Language"),fill = c("red", "blue"));
grid.draw(venn.plot)
c. What percent of Americans live below the poverty line and only speak English at home?
#Total below povery line - both poverty and foreign language at home
.146 - .042
## [1] 0.104
d. What percent of Americans live below the poverty line or speak a foreign language at home?
#P(poverty) + P(foreign language) - P(both)
.146 + .207 - .042
## [1] 0.311
e. What percent of Americans live above the poverty line and only speak English at home?
#1 - (P(poverty OR foreign language))
1- (.146 + .207 - .042)
## [1] 0.689
f. Is the event that someone lives below the poverty line independent of the event that the person speaks a foreign language at home?
#If independent, P(poverty) * P(foreign language) = known P(both)
.146 * .207
## [1] 0.030222
No not independent becuase .030 is different than .042, the known both. In other words, those that speak a foreign language at home are slightly more likely to experience poverty.
2.20
Assortative mating. Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise.
Blue_m <- c(78,23,13)
Brown_m <- c(19,23,12)
Green_m <- c(11,9,16)
df_eye <- data.frame(Blue_m, Brown_m, Green_m)
cn_eye <- c("Blue_f", "Brown_f", "Green_f")
colnames(df_eye) <- cn_eye
rn_eye <- c("Blue_m", "Brown_m", "Green_m")
rownames(df_eye) <- rn_eye
df_eye
## Blue_f Brown_f Green_f
## Blue_m 78 19 11
## Brown_m 23 23 9
## Green_m 13 12 16
a. What is the probability that a randomly chosen male respondent or his partner has blue eyes?
(sum(df_eye[1,]) + sum(df_eye[,1]) - df_eye[1,1])/sum(rowSums(df_eye))
## [1] 0.7058824
#don't want to store the total in a df but will need it for future calculations
df_eye_total <- sum(rowSums(df_eye))
b. What is the probability that a randomly chosen male respondent with blue eyes has a partner with blue eyes?
Probabilty that female partner has blue eyes given male with blue eyes -
Probability (A and B)/ Probability A
#blue-eyed couple / all blue-eyed males
(df_eye[1,1])/sum(df_eye[,1])
## [1] 0.6842105
c. What is the probability that a randomly chosen male respondent with brown eyes has a partner with blue eyes? What about the probability of a randomly chosen male respondent with green eyes having a partner with blue eyes?
Given brown-eyed male - blue-eyed female
#couple / all brown-eyed males
(df_eye[1,2])/sum(df_eye[,2])
## [1] 0.3518519
Given green-eyed male - blue-eyed female
(df_eye[1,3])/sum(df_eye[,3])
## [1] 0.3055556
d. Does it appear that the eye colors of male respondents and their partners are independent? Explain your reasoning.
If this is true, then P(A given B) = P(A) * P(B) given large enough sample. In other words, a blue-eyed male would have partner with blue-eyed women at the same relative rate as a green-eyed male. Testing this for blue-eyed couples.
#blue + blue expected if independent
(sum(df_eye[1,])/df_eye_total) * (sum(df_eye[,1])/df_eye_total)
## [1] 0.2958478
#blue + blue in the dating wild
df_eye[1,1]/df_eye_total
## [1] 0.3823529
Not equal. More blue-eyed couples than you’d expect by chance. Doubtlessly their are regional and ethnic issues here that complicate this.
Let’s try the same with brown-eyed couples.
#brown + brown expected if independent
(sum(df_eye[2,])/df_eye_total) * (sum(df_eye[,2])/df_eye_total)
## [1] 0.07136678
#brown in the dating wild
df_eye[2,2]/df_eye_total
## [1] 0.1127451
Same. Not independent.
2.30
Books on a bookshelf. The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
a. Find the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.
#hardcover then paperback fiction w/o replacement
28/95 * 59/94
## [1] 0.1849944
b. Determine the probability of drawing a fiction book first and then a hardcover book second, when drawing without replacement.
#format and type are not mutuall exclusive
#with second book, have to weigh possibility that one of 28 hardcover books was the fiction book selected
72/95 * (((28/94)*(72/95)) + ((27/94)*(1-(72/95))))
## [1] 0.2238039
c. Calculate the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.
72/95 * 28/95
## [1] 0.2233795
2.38
Baggage fees. An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
a. Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.
luggage_fees <- c(25,35)
fee_names <- c("One bag", "Two bags")
names(luggage_fees) <- fee_names
passenger_luggage_rates <- c(.54,.34,.12)
no_of_bag_rates <- c("No bags", "One bag", "Two bags")
names(passenger_luggage_rates) <- no_of_bag_rates
#check vectors
luggage_fees
## One bag Two bags
## 25 35
passenger_luggage_rates
## No bags One bag Two bags
## 0.54 0.34 0.12
#Average revenue per passenger - expected value
#1 bag
X1 <- (luggage_fees[1] * passenger_luggage_rates[2])
#2 bags
X2 <- ((luggage_fees[1] + luggage_fees[2]) * passenger_luggage_rates[3])
#Expected value, as no bags = 0
X <- X1 + X2
X
## One bag
## 15.7
The bastards are getting $15.70 per passenger.
#General Variance Formula
V = (0-X)^2 * passenger_luggage_rates[1] + (luggage_fees[1] - X)^2 * passenger_luggage_rates[2] + ((luggage_fees[1] + luggage_fees[2]) - X)^2 * passenger_luggage_rates[3]
LSD = sqrt(V)
LSD
## One bag
## 19.95019
b. About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justi???ed.
#Expected revenue for 120 passengers
X120 <- X * 120
X120
## One bag
## 1884
#SD for expecation
V120 <- V * 120
LSD120 <- sqrt(V120)
LSD120
## One bag
## 218.5434
#Assuming 120 discrete passengers aka variables
Exepcting $1884 in luggage revenue for the flight with a standard deviation of $218.54.
2.44
Income and gender. The relative frequency table below displays the distribution of annual total personal income (in 2009 in???ation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females.
a. Describe the distribution of total personal income.
a. Describe the distribution of total personal income.
The income distribution is right skewed, which you can see in the data by the big group in the right tail ($100k+). We know from other studies that the mean income substantially exceeds the median.
b. What is the probability that a randomly chosen US resident makes less than $50,000 per year?
ILT50 <- .022+.047+.158+.183+.212
ILT50
## [1] 0.622
c. What is the probability that a randomly chosen US resident makes less than $50,000 per year and is female? Note any assumptions you make.
#Making two bad assumptions
#1. That income distribution is not affected by gender
#2. That the gender represenation in the sample matches the US population.
ILT50 * .41
## [1] 0.25502
d. The same data source indicates that 71.8% of females make less than $50,000 per year. Use this value to determine whether or not the assumption you made in part (c) is valid.
#Females in sample * income-specific data from same source
.41 * .718
## [1] 0.29438
No, this was not a valid assumption. Leaving aside the fact that females were underrepresented in the sample, this higher rate in the actual data shows that gender affects income distribution - meaning females make less money than males.