Alban Guillaumet, Troy University
Calculting confidence intervals for the mean using Student's t-distribution (recall + critical values)
Describe a simple test asking whether the measurements of a data sample are consistent with a hypothesized value for the population mean: the one-sample t-test
Definition: For \( Y \sim N(\mu,\sigma) \), the
standard normal deviate
\[ Z = \frac{Y-\mu}{\sigma} \] is normally distributed with mean 0 and standard deviation 1.
Definition: For \( \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}) \), the statistic defined by
\[ Z = \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}} \] is normally distributed with mean 0 and standard deviation 1.
Definition: For \( \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}) \), the statistic defined by
\[ Z = \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}} \] is normally distributed with mean 0 and standard deviation 1.
Definition: For \( \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}) \), the statistic defined by
\[ t = \frac{\bar{Y}-\mu}{\mathrm{SE}_{\bar{Y}}} \] has aStudent’s \( t \)-distribution with \( n-1 \) degrees of freedom.
More probability in the tail for \( t \) since approximating \( \sigma_{\bar{Y}} \) (\( Z \)) by \( \mathrm{SE}_{\bar{Y}} \) (\( t \)) gives us more uncertainty.
\[ \mathrm{Pr[}t_{4} > 2.78\mathrm{]} = 0.025 \]
\[ t_{0.025(1),4} = 2.78 \]
\[ t_{0.05(1),4} = 2.13 \]
\[ \mathrm{Crit.\ val.} = t_{0.05(2),4} = 2.78 \]
In 95 % of repeated random samples of size n = 5 measurements from a normal population, the resulting sample mean will fall within 2.78 estimated standard errors of the true population mean.
| Name | R command | Uses |
|---|---|---|
dt(x, df) |
Density | |
| CDF | pt(q, df, lower.tail=TRUE) |
Compute \( P \)-values |
| CCDF | pt(q, df, lower.tail=FALSE) |
Compute \( P \)-values |
| QF | qt(p, df, lower.tail=TRUE) |
Compute critical values |
| CQF | qt(p, df, lower.tail=FALSE) |
Compute critical values |
\[ \mathrm{Pr[}t_{4} > 2.78\mathrm{]} = 0.025 \]
\[ t_{0.05(2),4} = 2.78 \]
qt(0.025, df=4, lower.tail=F)
[1] 2.776445
qt(0.975, df=4, lower.tail=T)
[1] 2.776445
Note: of course, you can also access critical values in Statistical tables
Chapter 11, Practice Problem #1
Consider the changes in highest elevation for 31 taxa, in meters, over the late 1900s and early 2000s. Positive and negative numbers will indicate upward and downward shifts in elevation, respectively.
str(myData)
'data.frame': 31 obs. of 2 variables:
$ elevationalRangeShift: num 58.9 7.8 108.6 44.8 11.1 ...
$ taxonAndLocation : Factor w/ 30 levels "aquatic bugs_UK",..: 21 7 8 9 6 1 10 12 13 14 ...
elevation <- myData$elevationalRangeShift
summary(elevation)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-19.30 15.95 35.80 39.33 63.45 108.60
Some statistics
n <- length(elevation)
m <- mean(elevation)
s <- sd(elevation)
SE <- s/sqrt(n)
matrix(c(n, m, s, SE), nrow=1, byrow=TRUE, dimnames=list("",c("Length","Mean","Sd","SE")))
Length Mean Sd SE
31 39.32903 30.66312 5.507259
95% confidence interval
(tcrit <- qt(0.025, df=n-1, lower.tail=FALSE))
[1] 2.042272
CI <- c(m - tcrit * SE, m + tcrit * SE)
names(CI) <- c("lower bound", "upper bound"); CI
lower bound upper bound
28.08171 50.57635
99% confidence interval
(tcrit <- qt(0.01/2, df=n-1, lower.tail=FALSE))
[1] 2.749996
CI <- c(m - tcrit * SE, m + tcrit * SE)
names(CI) <- c("lower bound", "upper bound"); CI
lower bound upper bound
24.18410 54.47397
Definition: The
one-sample \( t \)-test compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis.
\( H_{0} \): The true mean equals \( \mu_{0} \) (\( \mu = \mu_{0} \))
\( H_{A} \): The true mean does not equal \( \mu_{0} \) (\( \mu \neq \mu_{0} \))
Test statistic:
\[
t = \frac{\overline{Y}-\mu_{0}}{\mathrm{SE}_{\overline{Y}}}
\]
Sampling distribution of \( t \) under \( H_{0} \): \( t \)-distribution with \( df = n-1 \)
One-sample \( t \)-test
\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]
First, let's calculate a \( P \)-value:
( tstat <- ( m - 0 ) / SE )
[1] 7.141309
First, let's calculate a \( P \)-value:
( pval <- 2 * pt( abs(tstat), df = n-1, lower.tail=FALSE) )
[1] 6.056689e-08
With significance level \( \alpha = 0.05 \), since \( P < 0.05 \), we reject the null hypothesis.
One-sample \( t \)-test
Second, using critical values:
( data.frame(tstat, tcrit = qt(0.025, df = n-1, lower.tail=FALSE)) )
tstat tcrit
1 7.141309 2.042272
Conclusion: \( tstat > tcrit \), so we reject \( H_{0} \)
One-sample \( t \)-test
\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]
Third, using t.test function:
t.test(x = elevation, mu = 0, conf.level = 0.95)
One Sample t-test
data: elevation
t = 7.1413, df = 30, p-value = 6.057e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
28.08171 50.57635
sample estimates:
mean of x
39.32903