November 3, 2023
“If all the statisticians in the world were laid head to toe, they wouldn’t be able to reach a conclusion.”
- Anonymous
Theorem: If a variable \(Y\) has a normal distribution in a population, then the distribution of sample means \(\bar{Y}\) is also normal.
Theorem: \(Y \sim N(\mu,\sigma^2) \Rightarrow \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}^2)\), where \(\sigma_{\bar{Y}}\) is the
standard error of the mean given by \[ \sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}. \]
We can create a standard normal deviate from the sampling distribution as follows:
\[ Z = \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}}, \] where \(\sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}\).
\[-1.96 < \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}} < 1.96\]
\[\bar{Y} - 1.96\cdot\sigma_{\bar{Y}} < \mu < \bar{Y} + 1.96\cdot\sigma_{\bar{Y}}\]
Problem: We use \(\mathrm{SE}_{\bar{Y}}\) instead of \(\sigma_{\bar{Y}}\)!!
Definition: For \(Y \sim N(\mu,\sigma^2)\), the
standard normal deviate \[ Z = \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}} \] is normally distributed with mean 0 and standard deviation 1.
Definition: For \(Y \sim N(\mu,\sigma^2)\), the statistic defined by \[ t = \frac{\bar{Y}-\mu}{\mathrm{SE}_{\bar{Y}}} \] has a
Student’s \(t\)-distribution with \(n-1\) degrees of freedom.
“Guinness brewer William S. Gosset’s work is responsible for inspiring the concept of statistical significance, industrial quality control, efficient design of experiments and, not least of all, consistently great tasting beer.”
– Dan Kopf
“Gosset used a pseudonym [Student] because Guinness prohibited its employees from publishing, following the unauthorized release of some brewing secrets a few years earlier by another employee.”
– Whitlock & Schluter
\[ 2\times \mathrm{Pr[}t_{4} > 2.78\mathrm{]} = 0.05 \]
\[ \mathrm{Crit.\ val.} = t_{0.05(2),4} = 2.78 \]
\[ \mathrm{Pr[}t_{4} > 2.78\mathrm{]} = 0.025 \]
\[ \mathrm{Crit.\ val.} = t_{0.025(1),4} = 2.78 \]
Name | R command | Uses |
---|---|---|
dt(x, df) |
- | |
CDF | pt(q, df, lower.tail=TRUE) |
- |
CCDF | pt(q, df, lower.tail=FALSE) |
Compute \(P\)-values |
QF | qt(p, df, lower.tail=TRUE) |
- |
CQF | qt(p, df, lower.tail=FALSE) |
Compute critical values |
Revisiting estimation of the mean!!
Two sides of our statistical coin:
Assumptions:
Remember, though, that the Central Limit Theorem can make our results somewhat robust to Assumption #1.
Definition: The
95% confidence interval for the mean is given by \[ \overline{Y} - t_{0.05(2),df}\mathrm{SE}_{\overline{Y}} < \mu < \overline{Y} + t_{0.05(2),df}\mathrm{SE}_{\overline{Y}}. \]
\[ -t_{0.05(2),df} < t_{df} = \frac{\overline{Y}-\mu}{\mathrm{SE}_{\overline{Y}}} < t_{0.05(2),df} \]
Chapter 11, Practice Problem #1
Consider the changes in highest elevation for 31 taxa, in meters, over the late 1900s and early 2000s. Positive and negative numbers will indicate upward and downward shifts in elevation, respectively.
'data.frame': 31 obs. of 2 variables:
$ elevationalRangeShift: num 58.9 7.8 108.6 44.8 11.1 ...
$ taxonAndLocation : chr "moths_Malaysia" "butterflies_Czech" "butterflies_Spain" "butterflies_UK" ...
Some statistics - Computing standard error
Length Mean Sd Sd err
31 39.32903 30.66312 5.507259
95% confidence interval
[1] 2.042272
99% confidence interval
Definition: The
one-sample \(t\)-test compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis.
Null hypothesis: \(H_{0}\): The true mean equals \(\mu_{0}\) (\(\mu = \mu_{0}\))
Alternate hypothesis: \(H_{A}\): The true mean does not equal \(\mu_{0}\) (\(\mu \neq \mu_{0}\))
Test statistic: \[
t = \frac{\overline{Y}-\mu_{0}}{\mathrm{SE}_{\overline{Y}}}
\] Sampling distribution of \(t\) under \(H_{0}\): \(t\)-distribution with \(df = n-1\)
One-sample \(t\)-test
\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]
First, let’s calculate a \(P\)-value:
[1] 6.056689e-08
Conclusion: With significance level \(\alpha = 0.05\), since \(P < 0.05\), we reject the null hypothesis.
One-sample \(t\)-test
\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]
Second, using critical values:
One-sample \(t\)-test
\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]
Third, using t.test
function:
One Sample t-test
data: elevation
t = 7.1413, df = 30, p-value = 6.057e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
28.08171 50.57635
sample estimates:
mean of x
39.32903
In many cases, it isn’t the mean that we are interested in estimating but the variability of a population measure.
Remember, variance is also a population parameter, so we should be able to estimate it.
Stalk-eyed flies have staring contests! Longer stalked flies usually win.
Definition: If \(Y\) has a normal distribution, then the sampling distribution of the quantity \[ \chi^{2} = (n-1)s^2/\sigma^2 \] is the \(\chi^2\) distribution with \(n-1\) degrees of freedom.
\[ \frac{df s^2}{\chi^2_{\alpha/2,df}} < \sigma^2 < \frac{df s^2}{\chi^2_{1-\alpha/2,df}} \]
lower bound upper bound
0.07238029 0.58225336
Note: Same assumptions as confidence interval for mean, but much less robust to deviations from these assumptions!!!
Introduction to Biostatistics, Fall 2023