Inference for a normal population

Alban Guillaumet, Troy University

Objectives

  • Calculting confidence intervals for the mean using Student's t-distribution (recall + critical values)

  • Describe a simple test asking whether the measurements of a data sample are consistent with a hypothesized value for the population mean: the one-sample t-test

Normal distribution

Definition: For \( Y \sim N(\mu,\sigma) \), the standard normal deviate
\[ Z = \frac{Y-\mu}{\sigma} \] is normally distributed with mean 0 and standard deviation 1.

Definition: For \( \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}) \), the statistic defined by
\[ Z = \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}} \] is normally distributed with mean 0 and standard deviation 1.

The Student's t-distribution

Definition: For \( \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}) \), the statistic defined by
\[ Z = \frac{\bar{Y}-\mu}{\sigma_{\bar{Y}}} \] is normally distributed with mean 0 and standard deviation 1.

Definition: For \( \bar{Y} \sim N(\mu,\sigma_{\bar{Y}}) \), the statistic defined by
\[ t = \frac{\bar{Y}-\mu}{\mathrm{SE}_{\bar{Y}}} \] has a Student’s \( t \)-distribution with \( n-1 \) degrees of freedom.

The Student's t-distribution

More probability in the tail for \( t \) since approximating \( \sigma_{\bar{Y}} \) (\( Z \)) by \( \mathrm{SE}_{\bar{Y}} \) (\( t \)) gives us more uncertainty.

Critical value of Student's t

alt text

\[ \mathrm{Pr[}t_{4} > 2.78\mathrm{]} = 0.025 \]

\[ t_{0.025(1),4} = 2.78 \]

\[ t_{0.05(1),4} = 2.13 \]

alt text

\[ \mathrm{Crit.\ val.} = t_{0.05(2),4} = 2.78 \]

In 95 % of repeated random samples of size n = 5 measurements from a normal population, the resulting sample mean will fall within 2.78 estimated standard errors of the true population mean.

Remember

alt text

Visualisation

Summary of functions for t dist.

Name R command Uses
PDF dt(x, df) Density
CDF pt(q, df, lower.tail=TRUE) Compute \( P \)-values
CCDF pt(q, df, lower.tail=FALSE) Compute \( P \)-values
QF qt(p, df, lower.tail=TRUE) Compute critical values
CQF qt(p, df, lower.tail=FALSE) Compute critical values

Critical values of Student's t - in R

alt text

\[ \mathrm{Pr[}t_{4} > 2.78\mathrm{]} = 0.025 \]

\[ t_{0.05(2),4} = 2.78 \]

qt(0.025, df=4, lower.tail=F)
[1] 2.776445
qt(0.975, df=4, lower.tail=T)
[1] 2.776445

Note: of course, you can also access critical values in Statistical tables

Estimation: Confidence interval for the mean

Chapter 11, Practice Problem #1

Consider the changes in highest elevation for 31 taxa, in meters, over the late 1900s and early 2000s. Positive and negative numbers will indicate upward and downward shifts in elevation, respectively.

Practice Problem #1

str(myData)
'data.frame':   31 obs. of  2 variables:
 $ elevationalRangeShift: num  58.9 7.8 108.6 44.8 11.1 ...
 $ taxonAndLocation     : Factor w/ 30 levels "aquatic bugs_UK",..: 21 7 8 9 6 1 10 12 13 14 ...
elevation <- myData$elevationalRangeShift
summary(elevation)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -19.30   15.95   35.80   39.33   63.45  108.60 

Practice Problem #1

Some statistics

n <- length(elevation)
m <- mean(elevation)
s <- sd(elevation)
SE <- s/sqrt(n)
matrix(c(n, m, s, SE), nrow=1, byrow=TRUE, dimnames=list("",c("Length","Mean","Sd","SE")))
 Length     Mean       Sd       SE
     31 39.32903 30.66312 5.507259

Practice Problem #1

95% confidence interval

(tcrit <- qt(0.025, df=n-1, lower.tail=FALSE))
[1] 2.042272
CI <- c(m - tcrit * SE, m + tcrit * SE)
names(CI) <- c("lower bound", "upper bound"); CI
lower bound upper bound 
   28.08171    50.57635 

Practice Problem #1

99% confidence interval

(tcrit <- qt(0.01/2, df=n-1, lower.tail=FALSE))
[1] 2.749996
CI <- c(m - tcrit * SE, m + tcrit * SE)
names(CI) <- c("lower bound", "upper bound"); CI
lower bound upper bound 
   24.18410    54.47397 

Hypothesis testing: One-sample t-test

Definition: The one-sample \( t \)-test compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis.

\( H_{0} \): The true mean equals \( \mu_{0} \) (\( \mu = \mu_{0} \))
\( H_{A} \): The true mean does not equal \( \mu_{0} \) (\( \mu \neq \mu_{0} \))
Test statistic: \[ t = \frac{\overline{Y}-\mu_{0}}{\mathrm{SE}_{\overline{Y}}} \] Sampling distribution of \( t \) under \( H_{0} \): \( t \)-distribution with \( df = n-1 \)

Practice Problem #1 (Elevation)

One-sample \( t \)-test

\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]

First, let's calculate a \( P \)-value:

( tstat <- ( m - 0 ) / SE )
[1] 7.141309

Practice Problem #1 (Elevation)

First, let's calculate a \( P \)-value:

( pval <- 2 * pt( abs(tstat), df = n-1, lower.tail=FALSE) )
[1] 6.056689e-08

With significance level \( \alpha = 0.05 \), since \( P < 0.05 \), we reject the null hypothesis.

Practice Problem #1 (Elevation)

One-sample \( t \)-test

Second, using critical values:

( data.frame(tstat, tcrit = qt(0.025, df = n-1, lower.tail=FALSE)) )
     tstat    tcrit
1 7.141309 2.042272

Conclusion: \( tstat > tcrit \), so we reject \( H_{0} \)

Practice Problem #1 (Elevation)

One-sample \( t \)-test

\[ H_{0}: \mu = 0 \\ H_{A}: \mu \neq 0 \]

Third, using t.test function:

t.test(x = elevation, mu = 0, conf.level = 0.95)

Practice Problem #1 (Elevation)


    One Sample t-test

data:  elevation
t = 7.1413, df = 30, p-value = 6.057e-08
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 28.08171 50.57635
sample estimates:
mean of x 
 39.32903 
  • Note: the confidence interval is a fourth way to test \( H_{0} \)!