Dice rolls. (3.6, p. 92) If you roll a pair of fair dice, what is the probability of
solution
\(P(sum=1)=\frac{0}{36}=0\)
\(P(sum=5)=\frac{4}{36}=\frac{1}{9}=0.11111\)
\(P(sum=12)=\frac{1}{36}=0.02778\)
Poverty and language. (3.8, p. 93) The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.
Solution (a) Living below the poverty line and speaking a foreign language at home are not disjoint. There are some Americans who fall into both categories at the same time.
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
grid.newpage()
draw.pairwise.venn(area1 = 20.7, area2 = 14.6, cross.area = 4.2, category = c("A",
"B"), fill = c("green", "yellow"))
## (polygon[GRID.polygon.1], polygon[GRID.polygon.2], polygon[GRID.polygon.3], polygon[GRID.polygon.4], text[GRID.text.5], text[GRID.text.6], text[GRID.text.7], text[GRID.text.8], text[GRID.text.9])
Venn Diagram description. A represents Americans speaking a foreign language at home (20.7%) in which 16.5% don’t live below the poverty line; B represents Americans living below the poverty line(14.6%) in which 10.4% don’t speak a foreign language at home. Thus, the intersection (4.2%) represents Americans speaking a foreign language at home and living below poverty line.
10.4% of Americans live below the poverty line and only speak English at home.
20.7% + 14.6% - 4.2% = 31.1% of Americans live below the poverty line or speak a foreign language at home
14.6% below poverty line => 85.4% above poverty line in which 16.5% speak a foreign language at home. Therefore, there 68.9% of Americans living above the poverty line and only speak English at home
Someone can live below poverty line and that person can speak a foreign language at home. The occurrence of one affects the probability of occurrence of the other. Therefore, the event is not independent.
Assortative mating. (3.18, p. 111) Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise.
Solution
\(P = \frac{114}{204} + \frac{108}{204} - \frac{78}{204} = \frac{36}{51} = 0.7059\)
\(P = \frac{78}{114} = \frac{39}{57} = 0.6842\)
\(P = \frac{19}{54} = 0.3519\)
\(P = \frac{11}{36} = 0.3056\)
Let consider two events. A = male respondent with blues eyes & B = partner with blues eyes \(P(A) = \frac{114}{204} = \frac{19}{34}\), \(P(B) = \frac{108}{204} = \frac{9}{17}\), \(P(A)*P(B) =\frac{171}{578}\), \(P(A and B) = \frac{78}{204} = \frac{13}{34}\). Since \(P(A\ and\ B)\neq P(A)*P(B)\), we can say that those events are not independent.
Books on a bookshelf. (3.26, p. 114) The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
Solution
\(P = \frac{28}{95}*\frac{59}{95-1} = \frac{28}{95}*\frac{59}{94} = 0.18499\)
\(P = \frac{72}{95}*\frac{28}{95-1} = \frac{72}{95}*\frac{28}{94} = 0.22576\)
\(P = \frac{72}{95}*\frac{28}{95-1} = \frac{72}{95}*\frac{28}{95} = 0.22338\)
The possible outcomes for both cases are very large and too close, this makes the proportion to be very close. That why the final answers are very similar.
Baggage fees. (3.34, p. 124) An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
Solution
#(a) building the model
# Considering a passenger with 2 pieces of checked luggage
# will pay $25 for the first one
# and $35 for the second.
# Thus a total of $60
fees <-c(0, 25, 60)
# proportion of passengers, p
p <-c(0.54, 0.34, 0.12)
# Let define the scaler x, number of passenger
#THe average cost per x passenger: average<- sum((x*p)*fees)/sum(p)
# Average revenue per passenger:
x<-1
Avg_rev<- sum((x*p)*fees)/sum(p)
Avg_rev
## [1] 15.7
# Standard deviation
StdDev <-sqrt(0.54*(0-Avg_rev)^2 + 0.34*(25-Avg_rev)^2 + 0.12*(60-Avg_rev)^2)
StdDev
## [1] 19.95019
#(b)
# Let assume 54% have no checked baggages: 0.54*120 = 65;
#34% have one piece of checked luggage: 0.34*120 = 41;
# 12% have two pieces: 14. We have then y=(65,41,14)
y<-c(65,41,14)
Exp_revenue<- sum(y*fees)
Exp_revenue
## [1] 1865
Avg_Exp_revenue = mean(y*fees)
Avg_Exp_revenue
## [1] 621.6667
StdDev_Exp_revenue = sd(y*fees)
StdDev_Exp_revenue
## [1] 546.2676
Income and gender. (3.38, p. 128) The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females.
Solution
# Plot barplot for distribution
Income <- c("$1 to $9,999","$10,000 to $14,999","$15,000 to $24,999",
"$25,000 to $34,999","$35,000 to $49,999",
"$50,000 to $64,999","$65,000 to $74,999",
"$75,000 to $99,999","$100,000 or more")
Total_percent <- c(2.2,4.7,15.8,18.3,21.2,13.9,5.8,8.4,9.7)
summary_table <- data.frame(Income, Total_percent)
summary_table
barplot(Total_percent, names.arg = Income,
col = "light blue", xlab = "Income range", ylab ="Total_percent")
The distribution looks unimodal and most of the population has the personal income between $35,000 to $49,999. And there are more residents making less than $50,000 than those making more than $50,000.
(21.2+18.3+15.8+4.7+2.2)% = 62.2%. the probability that a randomly chosen US resident makes less than $50,000 per year is 0.622 or 62.2%
Assume there is 41% of females in the sample. Thus, 0.41*0.622 = 0.25502 or 25.50% Therefore, the probability that a randomly chosen US resident makes less than $50,000 per year and is female is 25.50%
As the same source indicates that 71.8% of females make less than $50,000 per year which is different from what I found in (c), my assumption seems wrong but this brings me to say also that person income and gender are not independent.