\[ \bbox[yellow,5px]
{
\color{black}{{\text {Density at z}} = \frac {1}{\sqrt {2\pi}}\exp{-\frac{1}{2}z^2}, -\infty<z<+\infty}
}
\]
\[ \bbox[yellow,5px]
{
\color{black}{{\text {Density at z}} = \frac {1}{\sqrt {2\pi}}\exp{-\frac{1}{2}z^2}, -\infty<z<+\infty}
}
\]
Cumulative SAT scores are approximated by a normal model with \(\mu = 1500 \text { and } \sigma = 300\).
What is the probability that a randomly selected SAT taker scores at least 1630 on the SAT?
\(z = \frac{x-\mu}{\sigma}=\frac{1630-1500}{300}=\frac{130}{300}=0.43\)
\(P(z\ge0.43)=0.3336\)
The probability that a randomly selected score is at least 1630 on the SAT is 33%.
Edward earned a 1400 on his SAT. What is his percentile?
\(z = \frac{x-\mu}{\sigma}=\frac{1400-1500}{300}=\frac{100}{300}=-0.33\)
\(P(z\le-0.33)=0.3707\)
Edward is at the 37th percentile.
Carlos believes he can get into his preferred college if he scores at least in the 80th percentile on the SAT. What score should he aim for?
At \(80th\) percentile, \(z = 0.84\)
\[ \begin{align} z & = \frac{x-\mu}{\sigma} \\ 0.84 & = \frac{x-1500}{300} \\ 0.84 \times 300 + 1500 & = x \\ x & = 1752 \end{align} \]
The 80th percentile on the SAT corresponds to a score of 1752.
The U.S. Air Force requires that pilots have heights between 64 in. and 77 in. Heights of women are normally distributed with a mean of \(63.7\) in. and a standard deviation of \(2.9\) in. What percentage of women meet that height requirement?
To recruit more women pilots in the Air Force, if the height requirements are relaxed to allow middle \(95\%\) of women based on the height distribution \((N \sim (63.7, 2.9))\), what will be the heights of the tallest and shortest women meeting the requirements?
Dataset (first 20 observations)
place time pace age gender state divPlace divTot 1 4494 92.25 9.225 38 M MD 690 1093 2 6298 106.35 10.635 33 M DC 1322 1490 3 2502 89.33 8.933 55 F VA 37 236 4 8176 113.50 11.350 24 F VA 878 974 5 3413 86.52 8.652 54 M CA 213 483 6 8008 112.30 11.230 42 F MD 785 974 7 8791 118.45 11.845 36 F VA 1215 1367 8 3987 95.17 9.517 25 F VA 1230 2782 9 3451 93.25 9.325 25 F PA 1074 2782 10 1046 72.37 7.237 43 M MD 111 931 11 3484 86.90 8.690 55 M VA 138 375 12 2987 84.47 8.447 30 M MD 659 1490 13 4427 96.65 9.665 39 F CA 587 1367 14 6496 108.93 10.893 40 M VA 846 931 15 5827 101.58 10.158 30 F DC 1297 2228 16 1224 82.78 8.278 24 F VA 164 974 17 4942 98.32 9.832 45 F MD 255 554 18 9579 134.18 13.418 33 F VA 2189 2228 19 6425 107.98 10.798 41 M VA 831 931 20 5951 102.08 10.208 36 F VA 809 1367
[1] 16924
Histogram of time for a randomly selected sample of size \(100\).
A histogram of \(1000\) sample means, where the samples are of size \(n = 100\). This histogram approximates the true sampling distribution of the sample mean, with mean \(\mu_{\bar x}\) and standard deviation \(\sigma_{\bar x}\).
The sampling distribution of a statistic represents the distribution of all values of the statistic (e.g. \(\text { sample mean, sample proporton, etc.}\)) when all possible samples of the same size \(n\) are drawn from the same population.
Understanding the concept of a sampling distribution is central to understanding statistical inference.
Parameter and Statistics
A statistic is a value from our observed data.
A parameter is a value that describes the population.
\[ \begin{array} {l|c} \text{Name} & \text{Statistic} & \text{Parameter} \\ \hline \text {Mean} & \bar y & \mu \\ \text {Std. Deviation} & s & \sigma \\ \text {Correlation} & r & \rho \\ \text {Regression Coefficient} & b & \beta \\ \text {Proportion} & \hat p & p \end{array} \]
Let \(X_1, X_2,...,X_n\) be \(n\) independently drawn observations from a population distribution with mean \(\mu\) and variance \(\sigma^2\).
Let \(\bar X\) be the mean of these \(n\) independent observations:
\[ \begin{align} \bar X &= \frac{X_1 + X_2 +...+ X_n}{n} \\ \\ E(\bar X) &= E(\frac{X_1 + X_2 +...+ X_n}{n}) \\ &= (\frac {1}{n})E(X_1 + X_2 +...+ X_n) \\ &= (\frac {1}{n})[E(X_1) + E(X_2) +...+ E(X_n)] \\ &= (\frac {1}{n})[\mu + \mu +...+ \mu] \\ &= (\frac {1}{n})[n.\mu] \\ &= \mu \\ \mu_{\bar x} &= \mu \end{align} \]
\[ \begin{align} \bar X &= \frac{X_1 + X_2 +...+ X_n}{n} \\ \\ Var(\bar X) &= Var(\frac{X_1 + X_2 +...+ X_n}{n}) \\ &= (\frac {1}{n^2})Var(X_1 + X_2 +...+ X_n) \\ &= (\frac {1}{n^2})[Var(X_1) + Var(X_2) +...+ Var(X_n)] \\ &= (\frac {1}{n^2})[\sigma^2 + \sigma^2 +...+ \sigma^2] \\ &= (\frac {1}{n^2})[n.\sigma^2] \\ &= \frac{\sigma^2}{n} \\ \sigma^2_{\bar x} &= \frac{\sigma^2}{n} \\ SD_{\bar x} = \sigma_{\bar x} &= {\frac{\sigma}{\sqrt n}} \end{align} \]
The mean and standard Deviation of the sample proportion describe the center and spread of the distribution of all possible sample porportions \(\hat p\) from a random sample size of \(n\) with true population proportion \(p\).
\[ \begin{align} \mu_{\hat p} &= p \\ \\ \sigma_{\hat p} &= \sqrt{\frac{p(1-p)}{n}} \end{align} \]
Three important facts about the distribution of a sample mean \(\bar x\)
Three important facts about the distribution of a sample proportion \(\hat p\)
Problem:
In the 2012 Cherry Blossom 10 mile run, the average time for all of the runners is \(94.52\) minutes with a standard deviation of \(8.97\) minutes. The distribution of run times is approximately normal. Find the probabiliy that a randomly selected runner completes the run in less than \(90\) minutes.
Solution:
Because the distribution of run times is approximately normal, we can use normal approximation.
\[ \begin{align} Z &= \frac{\bar x-\mu_{\bar x}}{\sigma_{\bar x}} \\ &= \frac{90-94.52}{8.97/\sqrt 1} \\ &= -0.504 \\ \\ P(Z < -0.504) &= 0.3072 \end{align} \]
There is a \(30.72\%\) probability that a randomly selected runner will complete the run in less than \(90\) minutes.
Problem:
Find the probabiliy that the average of 20 runners is less than 90 minutes.
Solution:
Here, \(n = 20 < 30\), but the distribution of the population, that is, the distribution of run times is stated to be approximately normal. Because of this, the sampling distribution will be normal for any sample size.
\[ \begin{align} \sigma_{\bar x} &= \frac{\sigma}{\sqrt n} = \frac{8.97}{\sqrt {20}} = 2.01 \\ Z &= \frac{\bar x-\mu_{\bar x}}{\sigma_{\bar x}} = \frac{90-94.52}{2.01}= - 2.25 \\ P(Z<-0.504) &= 0.0123 \end{align} \] There is a \(1.23\%\) probability that the average run time of 20 randomly selected runners will be less than 90 minutes.
Problem:
Find the probability that less than \(15\%\) of the sample of \(400\) people will be smokers if the true proportion is \(20\%.\)
Solutions:
The mean of the sample proportion is the population proportion: \(\mu_\hat p = 0.20.\)
The standard deviation of \(\hat p\) is described by the standard deviation for the proportion:
\[\sigma_{\hat p}=\sqrt \frac{p(1-p)}{n} = \sqrt \frac{0.2(0.8)}{400} = 0.02\]
\[ \begin{align} Z &= \frac{\hat p - \mu_\hat p}{\sigma_\hat p} = \frac{0.15 - 0.20}{0.02} = -2.5 \\ \\ P(Z<-2.5) &= 0.0062 \end{align} \]
Problem:
\(13\%\) of the US ppopulation are left-handed. If an auditorium has \(15\) lefty seats, what is the probability that there will not be enough lefty seats for a class of \(90\) students (in other words, what is the probability that there will be more than \(15\) lefty students in the group)?
Solutions:
\[ \begin{align} \mu_\hat p &= 0.13 \\ \hat{p} &= 15/90 = 0.167 \\ \sigma_{\hat p}&=\sqrt \frac{p(1- p)}{n} = \sqrt \frac{0.13(0.87)}{90} = 0.035 \\ \\ Z &= \frac{\hat p - \mu_\hat p}{\sigma_\hat p} = \frac{0.167 - 0.13}{0.035} = 1.06 \\ P(\hat{p}>0.167) &= P(Z>1.06) = 0.1446 \end{align} \]
The distribution is approximately normal if
(1) curve fits the histogram; or
(2) on the QQ plot, the data points fall on the \(45^\circ\) line