Probability Theory & Distributions HCD - 594

Kevin Linares
Spring, 2015

Book Definition:

“with a random sample or randomized experiment, the probability an observation has a particular outcome is the proportion of times that outcome would occur in a very long sequence of observations”
Informal definition
- the probability of an event is the relative frequency in the outcomes

What is the probability that it will snow tomorrow? P(S) = .70 = 70%

What is the probability that it will not snow tomorrow? P(not S) = .30 = 30%

P(S) + P(not S) = .70 + .30 = 1.00 or 100%

Example: Family with 2 kids, what is the the sample space? (S) = {(B, B), (B, G), (G, B), (G, G)}

Babies born in 1981

Binomial distribution Bin(n,p)

Out of 100 births, what would be an estimated proportion of boys given that 51.3% of all births in 1981 were males?

0 = females, 1 = males: Hmmmm, 48% were boys, but close to our 51.3%

Boys = rbinom(100, size= 1, prob = 0.513)
Boys100 = table(Boys)/length(Boys)
barplot(Boys100)

plot of chunk unnamed-chunk-1

0 = females, 1 = males: 50.8% were boys that is closer to our 51.3%

Boys = rbinom(1000, size= 1, prob = 0.513)
Boys100 = table(Boys)/length(Boys)
barplot(Boys100)

plot of chunk unnamed-chunk-2

Gaussian distribution

Empirical rule

Let's test this assumption on our data using reading scores from the ECLS-K

plot of chunk unnamed-chunk-3

Hmmm, partially close the the empirical rule: N=16,109

What can explain those outliers in our data that are skewing the data? Maybe type of schools?

Many schools that fall under Public school type 78% plot of chunk unnamed-chunk-4

Public=79%, Private=12%, Religious=6%, Catholic = 3% plot of chunk unnamed-chunk-5

plot of chunk unnamed-chunk-6

Outliers might be within the private type schools plot of chunk unnamed-chunk-7

Lost of variability in private schools plot of chunk unnamed-chunk-8