2.6 Dice rolls

  1. P(1) = 0 (the minimum in the range for two dice is 2)
  2. P(5) = P(1 & 4) + P(2 & 3) + P(3 & 2) + P(4 & 1) = .1111
Answer2.6b <- 4 * (1 / 6 * 1 /6)
Answer2.6b
## [1] 0.1111111
  1. P(12) = P(6 & 6) = .2778
Answer2.6c <- 1 / 6 * 1 / 6
Answer2.6c
## [1] 0.02777778

2.8 Poverty and language

  1. No, they are not disjoint.
BelowPoverty <- .146
NotEngHome <- .207
PovAndNotEng <- .042
  1. Venn diagram summarizing variables / probabilities.
# uploading Venn2.jpeg to github in the interest of time (gave troubleshooting 90min but couldn't get the image embedded using knitr, readjpeg, imager, raster, etc.)
# ![Venn Diagram](/Users/jlobr/OneDrive/Learning/_CUNY_SPS_MSDS/2018_Spring/DATA 606/Chapter 2/Homework 2/Venn2.jpg)
  1. 10.4% of Americans live below poverty line and only speak English at home.
Answer2.8c <- BelowPoverty - PovAndNotEng
Answer2.8c
## [1] 0.104
  1. 31.1% of Americans live below poverty line or speak English at home.
Answer2.8d <- BelowPoverty + NotEngHome - PovAndNotEng
Answer2.8d
## [1] 0.311
  1. 68.9% of American live above the poverty line and only speak English at home.
Answer2.8e <- 1 - (BelowPoverty + NotEngHome - PovAndNotEng) 
Answer2.8e
## [1] 0.689
  1. The events are not independent.
# If indepedent, then P(A or B) = P(A) * P(B) should be true
Answer2.8f <- (BelowPoverty + NotEngHome - PovAndNotEng) == BelowPoverty * NotEngHome
Answer2.8f
## [1] FALSE

2.20 Assortative mating

Self.Blue <- c(78, 23, 13)
Self.Brown <- c(19, 23, 12)
Self.Green <- c(11, 9, 16)
Assort.Mate <- rbind(Self.Blue, Self.Brown, Self.Green)
Assort.Mate <- cbind(Assort.Mate, rowSums(Assort.Mate))
colnames(Assort.Mate) <- c("Partner.Blue", "Partner.Brown", "Partner.Green", "Partner.Total")
Assort.Mate <- rbind(Assort.Mate, colSums(Assort.Mate))
rownames(Assort.Mate)[4] <- "Self.Total"
AM <- data.frame(Assort.Mate)
str(AM)
## 'data.frame':    4 obs. of  4 variables:
##  $ Partner.Blue : num  78 19 11 108
##  $ Partner.Brown: num  23 23 9 55
##  $ Partner.Green: num  13 12 16 41
##  $ Partner.Total: num  114 54 36 204
  1. 70.6% is the probability that randomly chosen male respondent or his partner have blue eyes.
Answer2.20a <- (AM["Self.Blue", "Partner.Total"] + AM["Self.Total", "Partner.Blue"] - AM["Self.Blue", "Partner.Blue"]) / AM["Self.Total", "Partner.Total"]
Answer2.20a
## [1] 0.7058824
  1. 68.4% is the probability that randomly chosen male respondent with blue eyes has partners with blue eyes.
Answer2.20b <- AM["Self.Blue", "Partner.Blue"] / AM["Self.Blue", "Partner.Total"]
Answer2.20b
## [1] 0.6842105
  1. 35.1% is the probability that random chosen male respondent with brown eyes has partner with blue eyes.
Answer2.20c1 <- AM["Self.Brown", "Partner.Blue"] / AM["Self.Brown", "Partner.Total"]
Answer2.20c1
## [1] 0.3518519

(c cont’d) 30.6% is the probability that random chosen male respondent with green eyes has partner with blue eyes.

Answer2.20c2 <- AM["Self.Green", "Partner.Blue"] / AM["Self.Green", "Partner.Total"]
Answer2.20c2
## [1] 0.3055556

(d) [the colors of male respondents and partners are independent…]

2.30 Books on a bookshelf

Type.Fiction <- c(13, 59, 72)
Type.Nonfiction <- c(15, 8, 23)
Type.Total <- c(28, 67, 95)
Bookshelf <- rbind(Type.Fiction, Type.Nonfiction, Type.Total)
colnames(Bookshelf) <- c("Format.Hardcover", "Format.Paperback", "Format.Total")
BS <- data.frame(Bookshelf)
str(BS)
## 'data.frame':    3 obs. of  3 variables:
##  $ Format.Hardcover: num  13 15 28
##  $ Format.Paperback: num  59 8 67
##  $ Format.Total    : num  72 23 95
  1. 18.5% is the probability of drawing a hardcover book first then a paperback fiction book second when drawing without replacement.
Answer2.30a <- (BS["Type.Total", "Format.Hardcover"] / BS["Type.Total", "Format.Total"]) * (BS["Type.Fiction", "Format.Paperback"] / (BS["Type.Total", "Format.Total"] - 1))
Answer2.30a
## [1] 0.1849944
  1. 22.6% is the probability of drawing a fiction book first and then a hardcover book second, when drawing without replacement.
Answer2.30b <- (BS["Type.Fiction", "Format.Total"] / BS["Type.Total", "Format.Total"]) * (BS["Type.Total", "Format.Hardcover"] / (BS["Type.Total", "Format.Total"] - 1))
Answer2.30b
## [1] 0.2257559
  1. 22.3% is the probability of the scenario in part (b), except this time complete the calculations under the scenario where the first book is placed back on the bookcase before randomly drawing the second book.
Answer2.30c <- (BS["Type.Fiction", "Format.Total"] / BS["Type.Total", "Format.Total"]) * (BS["Type.Total", "Format.Hardcover"] / BS["Type.Total", "Format.Total"])
Answer2.30c
## [1] 0.2233795
  1. The results for (b) and (c) are similar because the only change is that the sample space for the second choice in (b) has shrunk by 1, slightly improving the odds of chosing a hardcover book in that second choice.

2.38 Baggage fees

  1. Here’s a probability table:
x <- c(0, 25, 35)
Px <- c(.54, .34, .12)
Ex <- x * Px
ProbTable2.38 <- rbind(x, Px, Ex)
colnames(ProbTable2.38) <- c("No bag", "One checked", "Two checked")
ProbTable2.38 <- data.frame(ProbTable2.38)
ProbTable2.38
##    No.bag One.checked Two.checked
## x    0.00       25.00       35.00
## Px   0.54        0.34        0.12
## Ex   0.00        8.50        4.20

(a cont’d) $12.70 is the average revenue per passenger.

Answer2.38a1 <- sum(Ex)
Answer2.38a1
## [1] 12.7

(a cont’d) $14.37 is the standard deviation.

xLessEx <- (x - Ex)
xLessExSquared <- xLessEx^2
xLessExSquaredbyProb <- xLessExSquared * Px
ProbTable2.38 <- rbind(ProbTable2.38, xLessEx, xLessExSquared, xLessExSquaredbyProb)
rownames(ProbTable2.38)[4:6] <- c("(x - Ex)", "(x - Ex)^2", "(x - Ex)^2 * Px")
ProbTable2.38
##                 No.bag One.checked Two.checked
## x                 0.00      25.000     35.0000
## Px                0.54       0.340      0.1200
## Ex                0.00       8.500      4.2000
## (x - Ex)          0.00      16.500     30.8000
## (x - Ex)^2        0.00     272.250    948.6400
## (x - Ex)^2 * Px   0.00      92.565    113.8368
Varx <- sum(xLessExSquaredbyProb)
Answer2.38a2 <- sqrt(Varx)
Answer2.38a2
## [1] 14.36669
  1. For 120 passengers, the expected revenue is $1,524 with a standard deviation of $1,724. This assumes the proportion of those checking no, a single, and two bags remain consistent for that population of passengers.
x.b <- c(0, 120*25, 120*35)
Px.b <- c(.54, .34, .12)
Ex.b <- x.b * Px.b
ProbTable2.38b <- rbind(x.b, Px.b, Ex.b)
colnames(ProbTable2.38b) <- c("No bag", "One checked", "Two checked")
ProbTable2.38b <- data.frame(ProbTable2.38b)
ProbTable2.38b
##      No.bag One.checked Two.checked
## x.b    0.00     3000.00     4200.00
## Px.b   0.54        0.34        0.12
## Ex.b   0.00     1020.00      504.00
xLessEx.b <- (x.b - Ex.b)
xLessExSquared.b <- xLessEx.b^2
xLessExSquaredbyProb.b <- xLessExSquared.b * Px.b
ProbTable2.38b <- rbind(ProbTable2.38b, xLessEx.b, xLessExSquared.b, xLessExSquaredbyProb.b)
rownames(ProbTable2.38b)[4:6] <- c("(x - Ex)", "(x - Ex)^2", "(x - Ex)^2 * Px")
ProbTable2.38b
##                 No.bag One.checked Two.checked
## x.b               0.00     3000.00     4200.00
## Px.b              0.54        0.34        0.12
## Ex.b              0.00     1020.00      504.00
## (x - Ex)          0.00     1980.00     3696.00
## (x - Ex)^2        0.00  3920400.00 13660416.00
## (x - Ex)^2 * Px   0.00  1332936.00  1639249.92
Varx.b <- sum(xLessExSquaredbyProb.b)
Answer2.38b <- sqrt(Varx.b)
Answer2.38b
## [1] 1724.003

2.44 Income and gender

  1. The distribution of total personal income appears slightly leptokurtic, with a rightward skew as the $100,000+ bin has no ceiling and is nearly 10% of total.
  2. 62.2% is the probability that a randomly chosen person makes less than $50,000 per annum.
Answer2.44b <- .022 + .047 + .158 + .183 + .212
Answer2.44b
## [1] 0.622
  1. 25.5% is the probability that a randomly chosen US resident makes less than $50,000 per year and is female. This rests on the assumption that the 59% M / 41% F survey respondent composition is consistent and projectable across all personal income brackets.
Answer2.44c <- Answer2.44b * .41
Answer2.44c
## [1] 0.25502
  1. The assumption in (c) of consistent distribution of sex would mean that 62.2% of females make less than $50,000 per annum. If that value is actually 71.8% of females, then the male-female distribution is not consistent across personal income brackets, and in fact a larger proportion of females have lower personal income.