Practice: 3.1 (see normalPlot), 3.3, 3.17 (use qqnormsim from lab 3), 3.21, 3.37, 3.41
3.1 Area under the curve, Part I. What percent of a standard normal distribution N(μ = 0, sd = 1) is found in each region? Be sure to draw a graph.
#(a) Z < -1.35
normalPlot(mean = 0, sd = 1, bounds = c(-5, -1.35))
#(b) Z > 1.48
normalPlot(mean = 0, sd = 1, bounds = c(1.48, 5))
#(c)???0.4 < Z < 1.5
normalPlot(mean = 0, sd = 1, bounds = c(0.4, 1.5))
#d |Z| > 2
normalPlot(mean = 0, sd = 1, bounds = c(2, 4))
normalPlot(mean = 0, sd = 1, bounds = c(-4, -2))
Sophia who took the Graduate Record Examination (GRE) scored 160 on the Verbal Reasoning section and 157 on the Quantitative Reasoning section. The mean score for Verbal Reasoning section for all test takers was 151 with a standard deviation of 7, and the mean score for the Quantitative Reasoning was 153 with a standard deviation of 7.67. Suppose that both distributions are nearly normal. ### (a) Write down the short-hand for these two normal distributions.
Verbal Reasoning normal distribution N(μ=151,σ=7)
Quantitative Reasoning normal distribution N(μ=153,σ=7.67)
Verbal Reasoning Z =(x−μ)/σ=(160−151)/7=1.286
Quantitative Reasoning Z =(x−μ)/σ=(157−153)/7.67=0.5215
normalPlot(mean = 151, sd = 7, bounds = c(160 , 161))
normalPlot(mean = 153, sd = 7.67, bounds = c(157 , 158))
The Z-score of an observation is the number of standard deviations it falls above or below the mean = 0. Verbal Reasoning Z is 1.286 which means 1.286 sd above the mean. and Quantitative Reasoning Z is 0.5215 which means 0.5215 sd above the mean.
She has more zscore for Verbal Reasoning with 1.286 sd above the mean.
# percentile for Verbal Reasoning
pnorm(1.286) * 100
## [1] 90.07785
# percentile for Quantitative Reasoning
pnorm(0.5215) * 100
## [1] 69.89907
Quantitative Reasoning section?
#percent of the test takers did better than her on the Verbal Reasoning
(1- pnorm(1.286)) * 100
## [1] 9.922153
#percent of the test takers did better than her on the Quantitative Reasoning section
(1- pnorm(0.5215)) * 100
## [1] 30.10093
Both distribution has different mean and sd, compairing raw score would lead into incorrect conclution. Converting into z score means your distruition is scaled into a common scale where mean is always 0 and use sd to gauge the variation of scores from two different distrubtion.
Yes, it would change. Any change distrubtion will have a chnage in mean and sd. As long as this parameters chnages, z score and other percentiles we calculated above will change.
Below are final exam scores of 20 Introductory Statistics students.
57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94
As per Rule: 68% of observations 1 sd above and below the mean( ie between 69.26 and 86.14) actual percentage = 14/20 * 100 = 70 %
As per Rule: 95% of observations 2 sd above and below the mean( ie between 60.82 and 94.58) actual percentage = 18/20 * 100 = 90 %
As per Rule: 99.7% of observations 3 sd above and below the mean( ie between 52.38 and 103.02) actual percentage = 20/20 * 100 = 100 %
It roughly folow 68-95-99.7% Rule.
Slightly left skewed, But looks to be a close enough to normal based on qq plot
normalPlot(mean = 77.7, sd = 8.44)
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
hist(scores, prob=TRUE,
xlab="scores", ylim=c(0, .1),
main="normal curve over histogram")
curve(dnorm(x, mean=mean(scores), sd=sd(scores)),
col="darkblue", lwd=2, add=TRUE, yaxt="n")
qqnormsim(scores)
## 3.21 Married women The 2010 American Community Survey estimates that 47.1% of women ages 15 years and over are married.
P(N,N,M) = (1-.471) * (1-.471) * .47 (0.529) (0.529) (0.471) = 0.132
Using Geometric Distribution = (1−p)n−1p , where “success” is p in n trials
P(N,N,M) = (1−0.471)2×0.471=(0.529)2(0.471)=0.132=13.2
P(M, M ,M ) = 0.471 * 0.471 * 0.471 = 0.104
mean=1/p=1/0.471=2.12, so just over 2 or by the 3rd woman
sd=(sqrt(1-p)/p2)=1.54
mean=1/p=1/0.30=3.33, so over 3 or by the 4th woman
sd=(sqrt(1−p)/p2)=2.79
If the probabilty reduces, the mean and sd increases.
The formula for the number of ways to arrange n objects is n!=n * (n−1) * ⋅⋅⋅ * 2 * 1 . This exercise walks you through the derivation of this formula for a couple of special cases.
A small company has five employees: Anna, Ben, Carl, Damian, and Eddy. There are five parking spots in a row at the company, none of which are assigned, and each day the employees pull into a random parking spot. That is, all possible orderings of the cars in the row of spots are equally likely.
P(parking_alphabetical-Anna, Ben, Carl, Damian, Eddy) = 1/ 5! = 1/(54321) = 1/120 = .05
n! = 5! = 120
n! = 8! =(8×7×6×5×4×3×2×1)=40,320
For a sociology class project you are asked to conduct a survey on 20 students at your school. You decide to stand outside of your dorm’s cafeteria and conduct the survey on a random sample of 20 students leaving the cafeteria after dinner one evening. Your dorm is comprised of 45% males and 55% females.
The negative binomial distribution can be used here. it describes the probability of observing the kth success on the nth trial. nth trial = 4th person, kth success = 2nd female
The negative binomial distribution must meet these 4 conditions.
The trials are independent. Each trial outcome can be classified as a success or failure. The probability of a success (p) is the same for each trial. The last trial must be a success.
P(thekthsuccess_onthe_n_th_trial)=(n−1 k−1)pk(1−p)n−k
P(thekthsuccess_onthe_n_th_trial)=(n−1)!/(k−1)!(n−k)!pk(1−p)n−k
P(thekthsuccess_onthe_n_th_trial)=3!/1!2!×(.55)2×(.45)2
P(thekthsuccess_onthe_n_th_trial)=3(0.3025)(0.2025)=0.184
The binomial coefficient =(n−1)!/(k−1)!(n−k)!=(4−1)!/(2−1)!(4−2)!=3!/1!2!=3
In the binomial formula we are looking at k successes out of n combinations, which give you the n and k factorials. The negative binomial formula is when the last trial is fixed to success, so we care about the remaining k-1 successes in n-1 combinations.