title: " Data 606: Chapter 3 - Probability"
author: “Sufian”
RPubs Link:
http://rpubs.com/ssufian/528292
output:
html_document:
df_print: paged
pdf_document:
extra_dependencies:
- geometry
- multicol
- multirow
Dice rolls. (3.6, p. 92) If you roll a pair of fair dice, what is the probability of
Ans:
You will always have sum >1, Not possible, so probability is zero
Ans:
0.11030 (see simulation below based on 100 thousand trials)
Ans:
0.02863 (see simulation below based on 100 thousand trials)
require(dice)
## Loading required package: dice
## Loading required package: gtools
two.dice <- function(){
dice <- sample(1:6, size = 2, replace = TRUE)
return(sum(dice))
}
sims <- replicate(100000, expr=two.dice())
# A table of simulation and its associated probablities
#table(sims)
table(sims)/length(sims)
## sims
## 2 3 4 5 6 7 8 9 10
## 0.02775 0.05656 0.08280 0.11237 0.13848 0.16703 0.13706 0.11122 0.08264
## 11 12
## 0.05625 0.02784
Poverty and language. (3.8, p. 93) The American Community Survey is an ongoing survey that provides data every year to give communities the current information they need to plan investments and services. The 2010 American Community Survey estimates that 14.6% of Americans live below the poverty line, 20.7% speak a language other than English (foreign language) at home, and 4.2% fall into both categories.
No, 4.2% fall into both categories
require(VennDiagram)
## Loading required package: VennDiagram
## Loading required package: grid
## Loading required package: futile.logger
##
## Attaching package: 'futile.logger'
## The following object is masked from 'package:gtools':
##
## scat
# Pairwise Venn diagram
venn.plot <- draw.pairwise.venn(area1=.146,area2= .27,cross.area =0.042, c("Below proverty line", "Foriegn Language"))
grid.draw(venn.plot)
grid.newpage()
```
ans:
P(Foreign Language)= 20.7%
P(English only) = 1-20.7% = 79.3%
P((under poverty and English only)) = P(poverty)XP(English only)=14.6%X79.3%=11.57%
P(poverty) = 14.6% and P(poverty and Foreign Language)= 4.2%
P(Foreign Language) = 20.7%
Ans:
P(Poverty or Foreign Language) = P(Foreign Language) + P(poverty) - P(poverty and Foreign Language)
= 14.6% + 20.7% - 4.2% = 31.1%
Ans:
P(above poverty and English ) = 1- P(poverty) = 14.6%
= 1- 14.6%
= (85.54%)X79.3%
=67.7%
Ans:
P(poverty) = 14.6%
P(Foreign Language) = 20.7%
P(Poverty and Foreign Language) = 4.2%
Test of independence : P(A and B) = P(A)*P(B)
In this case:
P(Poverty and Foreign Language) = 4.2% Not Equal to P(poverty) = 14.6% X P(Foreign Language) = 20.7%
0.042 Not Equal to 0.0302
Therefore these 2 events are dependent.
Assortative mating. (3.18, p. 111) Assortative mating is a nonrandom mating pattern where individuals with similar genotypes and/or phenotypes mate with one another more frequently than what would be expected under a random mating pattern. Researchers studying this topic collected data on eye colors of 204 Scandinavian men and their female partners. The table below summarizes the results. For simplicity, we only include heterosexual relationships in this exercise.
ans:
P(self male with blue eyes) + P(partner female with blue eyes) - P(both male female and blue eyes) =
114/204 + 108/204 - 78/204 = 70.59%
ans:
P(female blue eyes|male blue eyes) = P(female blue eyes and male blue eyes) / p(male blue eyes)
= 68.84%
ans:
P(female blue eyes|male brown eyes) = P(female blue and male brown eyes) / p(male brown eyes)
= (19/204)/(54/204)
= 35.18%
P(female blue eyes|male green eyes) = P(female blue and male green eyes) / p(male green eyes)
= (11/204)/(36/204)
= 11/36 = 30.55%
ans:
No, see illustraton below
P(A|B) = P(A), B has no effect on A
Using example from above:
P(female blue eyes|male green eyes) = P(female blue and male green eyes) / p(male green eyes)
= 30.55%
P(female blue eyes|male green eyes) = P(female blue eyes)
= 108/204 = 52.9%, Male green eyes has no effect on female blue eyes
Because they are NOT equal, means they are dependent events
Books on a bookshelf. (3.26, p. 114) The table below shows the distribution of books on a bookcase based on whether they are nonfiction or fiction and hardcover or paperback.
ans:
28/95*59/94 = 18.5%
ans:
p(first: hard cover fiction)p(second: hard cover) + p(first: paper fiction)(p(second: hard cover)) =
13/9527/94 + 59/9528/94 = 22.4%
ans:
With replacement
p(first: hard cover fiction)p(second: hard cover) + p(first: paper fiction)(p(second: hard cover)) =
13/9528/95 + 59/9528/95 = 22.3%
ans:
Because the percentage changes for replacement vs. without replacement is small
Baggage fees. (3.34, p. 124) An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
ans:
Qty of bag P(X) Revs P(X)xRevs
zero bag 0.54 0 0
1 bag 0.34 25 8.5
2 bag 0.12 60 7.2
exp Rev= 15.7
x 0 25 35
prob 0.54 0.34 0.12
x*P 0 8.5 4.2
x-mean -15.7 9.3 19.3
(x-mean)^2 246.4 86.49 372.49
(x-mean)^2xP 133.10 29.40 44.69
vAR = 207.21
STD DEV = 14.39
ans:
Passenger P(X) Split Qtybags Rev/Bag Rev 120 0.54 65 0 $0 $0 0.34 41 1 $25 $1,020 0.12 14 2 $60 $864
Expected rev = $1884
This is a really left-skewed distribution, most passengers do not carry bags and using std dev.
would be mis-leading
Income and gender. (3.38, p. 128) The relative frequency table below displays the distribution of annual total personal income (in 2009 inflation-adjusted dollars) for a representative sample of 96,420,486 Americans. These data come from the American Community Survey for 2005-2009. This sample is comprised of 59% males and 41% females.
A right skewed distribution
P(<50K) = 2.2+4.7+15.8+18.3+21.2 =62.2%
P(<50K and female) = 62.2%x41% = 25.5%
Use what is factually correct:
.41%x96,420,486 = 39,532,399
P(<50K and female) = .718
From this, P(<50K and female) = 39532399X.718 = 28,384,263
My estimate from part c is:
.255x39532399 = 24,597,224
No, I had under estimated in part c, because since its right skewed, the mean is disproportionatly higher than it should be, which makes the std dev much bigger than it should be.