Estimation and Hypothesis Testing with Confidence Intervals

Introduction

Confidence interval estimation is another form of hypothesis testing that is often preferred over standard hypothesis testing such as what was explored in the last post. A primary reason why estimation is often preferred is it provides a range of values (called the interval) that can be assumed with reasonable confidence the interval includes the population parameter we are trying to estimate. Due to this property, instead of estimating a p-value, an interval can be developed that describes the difference in the true means of the population (if there is one). This interval can then be used to test a hypothesis.

Confidence intervals are reported as a proportion, denoted by $1 - \alpha$, which represents the ratio of intervals that would contain the population parameter if samples were redrawn and tested with the same procedure. A confidence level is the interval reported as a percentage, $(1 - \alpha) * 100\%$. In the previous example, an $\alpha$ of 0.05 was used, which results in a 95% confidence level. The width of the $(1 - \alpha) * 100\%$ interval has several dependencies:

The confidence level. As $1 - \alpha$ increases, so does the width of the interval.
As the sample size $n$ increases, the smaller the standard error and thus a narrower interval.
If the standard deviation is large, then the interval will also be large.

At a 95% confidence interval of all samples, the difference in sample means will lie within two standard errors of the mean.

Getting Started in R

The Salaries data will be used again from the last post in this example. This example will show how to calculate and interpret intervals while also testing the same hypothesis.

The tested hypothesis was there is no significant difference in the population mean in salaries earned by professors in departments categorized as ‘theoretical’ and ‘applied’ in a particular university. To state it again formally:

\[ H_0: \mu_1 = \mu_2 \] \[ H_A: \mu_1 \neq \mu_2 \]

Where $\mu_1$ is the mean salary of the applied departments and $\mu_2$ is the mean salary of the theoretical departments.

Begin by loading the R packages that will be needed.

library(ggplot2)
library(car)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

We can then split the data into the theoretical and applied departments as was done previously:

data(Salaries)
Salaries$discipline <- ifelse(Salaries$discipline == 'A', 'Theoretical', 'Applied')
sal <- Salaries[,c(2,6)]
theo <- filter(sal, discipline == 'Theoretical') 
appl <- filter(sal, discipline == 'Applied')

Welch’s t-interval

Welch’s t-interval, similar to Welch’s t-test, is an extension of the two-sample pooled t-interval for unequal population variances and sample sizes. The confidence interval for Welch’s test is defined as:

\[ \large \left(\bar{X_1} - \bar{X_2}\right) \pm \sqrt{\frac{s_{x_1}^2}{n_1} + \frac{s_{x_2}^2}{n_2}} \]

Therefore, the degrees of freedom $v$ needs to be found to find the critical t-value to calculate the confidence interval.

The t.test() function in R automatically provides confidence intervals.

welch.test <- t.test(appl$salary, theo$salary)
welch.test

## 
##  Welch Two Sample t-test
## 
## data:  appl$salary and theo$salary
## t = 3.1306, df = 377.83, p-value = 0.00188
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   3525.978 15434.549
## sample estimates:
## mean of x mean of y 
##  118028.7  108548.4

The confidence intervals as calculated by the t.test() function are (3525.9781323, 1.543454910^{4}). The confidence interval can be interpreted as the true mean of professor salary in applied departments is approximately $3,526 and $15,435 different than the mean salary of professors in theoretical departments. Since the interval does not contain 0, we can conclude the true mean of the two populations differ.

Manually Calculating Confidence Intervals

Replicating the results from the t.test() function is done by writing the confidence interval formula as stated above for the upper and lower bounds. The qt() function finds the critical t-value as denoted by $t_{\alpha/2,k}$ where $v$ equals degrees of freedom.

lower_conf <- (mean(appl$salary) - mean(theo$salary)) + (qt(0.025, 377.83) * sqrt((sd(appl$salary)^2/length(appl$salary)) + (sd(theo$salary)^2/length(theo$salary))))

upper_conf <- (mean(appl$salary) - mean(theo$salary)) - (qt(0.025, 377.83) * sqrt((sd(appl$salary)^2/length(appl$salary)) + (sd(theo$salary)^2/length(theo$salary))))

Then confirm the lower and upper bounds that were manually calculated match the output as given by the t.test() function.

welch.test$conf.int[1]

## [1] 3525.978

lower_conf

## [1] 3525.978

welch.test$conf.int[2]

## [1] 15434.55

upper_conf

## [1] 15434.55

Conclusion

Estimation with confidence intervals and how to calculate intervals was explored, as well as demonstrated how confidence intervals can be used to test a hypothesis. Confidence intervals are a useful way of estimating populations and are often more informative than standard hypothesis testing with p-values as intervals provide a range in which one can say with confidence contain the true population parameter.