Probability Theory & Distribution HCD - 594
Kevin Linares
Spring, 2015
Book Definition:
“with a random sample or randomized experiment, the probability an observation has a particular outcome is the proportion of times that outcome would occur in a very long sequence of observations”
Informal definition
What is the probability that it will snow tomorrow? P(S) = .70 = 70%
What is the probability that it will not snow tomorrow? P(not S) = .30 = 30%
P(S) + P(not S) = .70 + .30 = 1.00 or 100%
Babies born in 1981
1,769,000 girls (48.7%): 1,860,000 boys (51.3%)
P(B, B) = .5132 = .2631
P(G, G) = .4872 = .2371
P(B, G) = .513 * .487 = .2498
P(G, B) = .487 * .513 = .2498
=.263 + .250 + .250 + .237 = ?
Binomial distribution Bin(n,p)
Out of 100 births, what would be an estimated proportion of boys given that 51.3% of all births in 1981 were males?
0 = females, 1 = males: Hmmmm, 48% were boys, but close to our 51.3%
Boys = rbinom(100, size= 1, prob = 0.513)
Boys100 = table(Boys)/length(Boys)
barplot(Boys100)
0 = females, 1 = males: 50.8% were boys that is closer to our 51.3%
Boys = rbinom(1000, size= 1, prob = 0.513)
Boys100 = table(Boys)/length(Boys)
barplot(Boys100)
Gaussian distribution
Empirical rule
Let's test this assumption on our data using reading scores from the ECLS-K
Hmmm, partially close the the empirical rule: N=16,109
What can explain those outliers in our data that are skewing the data? Maybe type of schools?
Many schools that fall under Public school type 78%
Public=79%, Private=12%, Religious=6%, Catholic = 3%
Outliers might be within the private type schools
Lost of variability in private schools