Graded: 2.6, 2.8, 2.20, 2.30, 2.38, 2.44

2.6

2.8

poverty <- 14.6
ESL <- 20.7
both <- 4.2
library(VennDiagram)
## Loading required package: grid
## Loading required package: futile.logger
venn.plot <- draw.pairwise.venn(14.6, 20.7, 4.2, c("Poverty", "Not English"))
grid.draw(venn.plot)

grid.newpage();
p_a = 20.7
p_b = 14.6
p_a_and_b = 4.2
p_a_given_b = p_a_and_b/p_b
p_a_given_b
## [1] 0.2876712
p_a
## [1] 20.7

\[ P(B|A) \neq P(B) \]

2.20

df <- data.frame(matrix(c(78,19,11,108, 23,23,9,55,13,12,16,41,114,54,36,204),nrow=4,ncol=4))

rownames(df) <- c("mblue", "mbrown", "mgreen", "totals")
colnames(df) <- c("fblue", "fbrown", "fgreen", "totals")

df
##        fblue fbrown fgreen totals
## mblue     78     23     13    114
## mbrown    19     23     12     54
## mgreen    11      9     16     36
## totals   108     55     41    204
mb = 114
fb = 108
bfm = 78
answer = (mb+ fb-bfm)/204
answer
## [1] 0.7058824
p_a = fb
p_b = mb
p_a_and_b = bfm
p_a_given_b = p_a_and_b/p_b
p_a_given_b
## [1] 0.6842105
p_a == p_a_given_b
## [1] FALSE
pfb = 108/204
pmbr = 54/204
fbmbr = 19/204
a = pfb
b = pmbr
ab = fbmbr
pagb = ab/b
pagb
## [1] 0.3518519
a == pagb
## [1] FALSE

2.30

df <- data.frame(matrix(c(13,15,28,59,8,67,72,23,95),nrow=3,ncol=3))
rownames(df) <- c("fiction", "non-fiction", "totals")
colnames(df) <- c("hardcover", "paperback", "totals")
df
##             hardcover paperback totals
## fiction            13        59     72
## non-fiction        15         8     23
## totals             28        67     95
(28/95)*(67/94)
## [1] 0.2100784
fiction_and_hardcover = 13/95
fiction_not_hardcover = 59/95
second_fiction_given_fh = 27/94
second_fiction_given_nfh = 28/94
h_after_fh = second_fiction_given_fh*fiction_and_hardcover
h_after_nfh  =second_fiction_given_nfh*fiction_not_hardcover
answer = h_after_nfh+h_after_fh
answer
## [1] 0.2243001
(72/95)*(28/95)
## [1] 0.2233795

2.38

We know that revenues are $25 for first bag and $35 or the 2nd (making revenue $60 for each person who checks 2 bags). We also know that 54% people check no bags, 34% check 1, and 12% check 2. The expected per passenger is

df <- data.frame(matrix(c(0,25,60,.54,.34,.12),nrow=3,ncol=2))
colnames(df) <- c("x","P(x)")
df
##    x P(x)
## 1  0 0.54
## 2 25 0.34
## 3 60 0.12

The expected value is

expected = 0*.54 + 25*.34+.12*60
expected
## [1] 15.7

Likewise, the standard deviation can be found by

prices = c(0,25,60)
means = c(.54,.34,.12)
vec = c(prices - expected)
squares = c(vec * vec)
squares %*% means
##        [,1]
## [1,] 398.01
(squares %*% means)^.5
##          [,1]
## [1,] 19.95019

2.44

df <- data.frame(matrix(c(9999, 14999, 24999, 34999, 49999, 64999, 74999, 99999, 100000,.022,.047,.158,.183,.212,.139,.058,.084,.097),nrow=9,ncol=2))
colnames(df) <- c("income up to","total")
df
##   income up to total
## 1         9999 0.022
## 2        14999 0.047
## 3        24999 0.158
## 4        34999 0.183
## 5        49999 0.212
## 6        64999 0.139
## 7        74999 0.058
## 8        99999 0.084
## 9       100000 0.097
barplot(df$total)

Personal income tends to be grouped near $43k while a sizeable amount of people also make more than $75k. In fact, 62% of people make less than $50,000.

sum(df$`total`[0:5])
## [1] 0.622

If we assume that gender is independent of income, then the probability of a woman making less than $50,000 is the same as the probability for the total population. However, that population is only 1/2 female. So, we’d multiply these two seemingly independent events together to get

.622 * .5
## [1] 0.311

However since we know that the probability of a person being a woman given that they make <$50k a year is 71.8%, we know that these events are not independent becase \[ P(A) \neq P(A|B) \] s