DATA 606 Chapter 2 Probability

Assignments

Chapter 2 - Probability

Practice: 2.5, 2.7, 2.19, 2.29, 2.43
Graded: 2.6, 2.8, 2.20, 2.30, 2.38, 2.44

1. The lowest value we can get from a pair of dice is 1+1 = 2, so there is no chance of getting a sum of 1.
1. 0.11
- total combination we can get out of 2 dice: 6*6
- cases of sum of 5:
  - 1 + 4
  - 2 + 3
  - 3 + 2
  - 4 + 1

4 / 36

## [1] 0.1111111

1. 0.027
- 6 + 6

1 / 36

## [1] 0.02777778

1. No. There are people who are both living below the poverty line and speak a language other than english at home.

#install.packages('VennDiagram')
library(VennDiagram)

# in %
below_poverty <- 14.6
foreign_language <- 20.7
cross <- 4.2

venn <- draw.pairwise.venn(below_poverty, foreign_language, cross, 
                           c("BelowPoverty", "ForeignLanguage"))
grid.draw(venn)

1. As the venn diagram shows above, 10.4% only speaks English at home. Since 14.6% of Americans live below the poverty line and 4.2% speaks a language other than English at home, 10.4 % only speaks english at home.
1. By using the general addition rule, the answer is 31.1%.

# P(below poverty line or speak foreign language) = P(bp) + P(fl) − P(both)
bp_or_fl <- below_poverty + foreign_language - cross
bp_or_fl

## [1] 31.1

1. We can subtract the answer of (d) above from 100% to calculate the % of americans live above the poverty line and only speak English at home. The answer is 68.9%

# P(neither bp nor fl) = 1 - P(bp or fl)
100 - bp_or_fl

## [1] 68.9

1. Multiplication rule
- P(below poverty) * P(speak FL) = 3 %
- This doesn’t equal P(below poverty line and speak foreign language), which is 4.2%.
- This implies that there maybe correlation between the BP and FL, thus they are dependant.

(below_poverty*foreign_language) / 100

## [1] 3.0222

1. Bayes’ Theory: If the 2 events are independent, then the following should be true: P(below poverty line | speaks foreign language) = P(below poverty line)
- since the two values don’t equal (14.6% != 20.7%), we can say they are dependant.

cross/foreign_language

## [1] 0.2028986

below_poverty

## [1] 14.6

# A = Female Blue
# B = Male Blue
total <- 204
female_blue <- 108
male_blue <- 114
both_blue <- 78

1. female blue or male blue : 70.58%
- P(A U B)

#probability of female blue
p_fb <- female_blue/total
#probability of male blue
p_mb <- male_blue/total
#probability of male and female both blue
p_bnb <- both_blue/total
#probability of male or female blue
p_bub <- p_fb + p_mb - p_bnb

1. male blue given that female blue : 68.42%
- P (A|B) = P (A ∩ B) / P (B)

p_bnb / p_mb

## [1] 0.6842105

- P (Female Blue | Male Brown) = 35.18%
- P (Female Blue | Male Green) = 30.55%

19 / 54

## [1] 0.3518519

11 / 36

## [1] 0.3055556

1. Not independent.
- If the eyecolors are independent, then there should be no relationship in between.
- Since P(Female Blue | Male Brown ) != P(Female Blue), the eye colors of male is not independent of eye color of female.

# P(Female Blue | Male Brown)
19/54

## [1] 0.3518519

# P(Female Blue)
p_fb

## [1] 0.5294118

1. 18.49%

#marginal probability for hard cover
hc <- 28/95

#joint probability for paperback fiction without replacement
pf <- 59/94

hc*pf

## [1] 0.1849944

1. 22.576%

#marginal probability for fiction
f <- 72/95

# marginal probability of hardcover fiction without replacement
hf <- 28/94

f*hf

## [1] 0.2257559

(c): 22.33%

## [1] 0.7578947

# marginal probability of hardcover fiction is based on replacement
hf_rep <- 28/95

f * hf_rep

## [1] 0.2233795

(d): The final answers of b and c above are very similar because there was mere difference in denominator with book replacement. Given the number of the books on the shelves without replacement (94) to with replacement (95), the selection only increase by 1. This is too small number to have influence on the calculation.

1. The average revenue per passenger is $15.7 and the corresponding standard deviation is 19.95

luggages <- c("0", "1", "2")
price <- c(0, 25, 25+35)
p_pax <- c(0.54, 0.34, 0.12)

# Expected Value
X <- price*p_pax
EV <- sum(price * p_pax)
EV

## [1] 15.7

# SD
sd <- sqrt(0.54*(0-EV)^2 + 0.34*(25-EV)^2 + 0.12*(60-EV)^2)
sd

## [1] 19.95019

1. Assuming the travellers are independent, we can make first approximation = $1,884 with sd of 218.5413

pax <- 120
pax * EV

## [1] 1884

# sqrt of pax * variance
sqrt(pax*(19.95^2))

## [1] 218.5413

library(ggplot2)
income <- c("$1 to $9,999 or loss", 
            "$10,000 to $14,999", 
            "$15,000 to $24,999",
            "$25,000 to $34,999",
            "$35,000 to $49,999",
            "$50,000 to $64,000",
            "$65,000 to $74,999",
            "$75,000 to $99,999",
            "$100,000 or more")
total <- c(.022, .047, .158, .183, .212, .139, .058, .084, 0.097)
income_gender <- data.frame(income, total)

ggplot(income_gender, aes(income, total)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

(a): Assuming that there are outliers on the high end due to the nature of the data, the distribution is right skewed, with a median between $35,000 - $49,999. The IQR distribution is roughly at about 30,000.
(b): 62.2%

less_than_50 <- sum(income_gender$total[1:5])
less_than_50

## [1] 0.622

(c): Assuming that gender and income are independent, 25.5%

less_than_50 * 0.41

## [1] 0.25502

(d): Assuming the variables are independent, the % of females that make less than 50,000 per year (71.8%) would equal to the percentage of all that makes less than 50,000 (67.2%). If this is not the case, then, gender and income are dependant.

DATA 606 Chapter 2 Probability

Rose Koh

2/15/2018

links

Assignments

Chapter 2 - Probability