Confidence Intervals

Let’s generate some fake data and create confidence interval estimates for the population mean.

set.seed(12345)
x <- rnorm(100)
(avg_x <- mean(x))
## [1] 0.2451972
(n_x  <- length(x))
## [1] 100
(sd_x <- sd(x))
## [1] 1.114731
(se_x <- sd_x/sqrt(n_x))
## [1] 0.1114731
alpha5 <- 0.05        # Significance
(t5_x <- qt(1-(alpha5/2), n_x - 1))   # t value for 95 percent confidence interval
## [1] 1.984217
(CI5_x <- c( avg_x - t5_x*se_x, avg_x + t5_x*se_x ))   # 95 percent Confidence interval
## [1] 0.02401042 0.46638398
(CI5b_x <- avg_x + t5_x * c(-se_x,+se_x))    # Same as above; more concise
## [1] 0.02401042 0.46638398
alpha1 <- 0.01        # Significance
(t1_x <- qt(1-(alpha1/2), n_x - 1))   # t value for 99 percent confidence interval
## [1] 2.626405
(CI1_x <- c( avg_x - t1_x*se_x, avg_x + t1_x*se_x ))   # 99 percent Confidence interval
## [1] -0.04757632  0.53797071

One could use the above procedure to generate the confidence interval estimate for \(\mu \equiv \mu_1-\mu_2\) by taking two equal-length samples from populations 1 and 2 — \(x_1\) and \(x_2\) — and defining \(x\equiv x_1-x_2\) for the example above.

Hypothesis Testing

We want to test the null hypthesys \(H_0: \mu=\mu_0\) against three alternative hypotheses:

mu0 <- 0     # Hypothesized value of the population mean
(tstat_x <- (avg_x-mu0)/se_x)    # t-statistic obtained from the data
## [1] 2.199609
# degrees of freedom = n-1:
(df_x <- n_x - 1)
## [1] 99

Two-tailed test

Keep in mind that the critical value of the t-statistic would be the same for:

  • a two-tailed test with ten percent significance, and
  • a one-tailed test with five percent significance
alpha <- 0.05
(critical_two <- qt(1 - (alpha/2), df_x))
## [1] 1.984217
abs(tstat_x) > critical_two  # Reject Null Hypothesis if TRUE
## [1] TRUE

Here’s the short-cut method:

t.test(x, mu = mu0, conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  x
## t = 2.1996, df = 99, p-value = 0.03016
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.02401042 0.46638398
## sample estimates:
## mean of x 
## 0.2451972

Note that the automatic t-test command prints out the p-value for the data. The p-value can then be readily compared to the significance level. The null hypothesis is then rejected if the p-value is less than the significance level.

Here’s a way to calculate the p-value:

(p <- 2 * (1 - pt(abs(tstat_x), n_x - 1)))
## [1] 0.03016145

Also, note the 95 percent confidence interval generated by the t.test() command. It is the same as the 95 percent confidence interval that was generated laboriously earlier on this page.

The output produced by the t.test() command can be extracted individual component by individual component as follows:

t2tail <- t.test(x, mu = mu0, conf.level = 0.95)
names(t2tail)
## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"
t2tail$statistic
##        t 
## 2.199609
t2tail$p.value
## [1] 0.03016145
t2tail$conf.int
## [1] 0.02401042 0.46638398
## attr(,"conf.level")
## [1] 0.95
t2tail$null.value
## mean 
##    0

This could be useful in writing reports.

One-tailed test; greater than

(critical_one <- qt(1 - alpha, df_x))
## [1] 1.660391
tstat_x > critical_one  # Reject Null Hypothesis if TRUE
## [1] TRUE

Here’s the short-cut method:

t.test(x, mu = mu0, alternative = "greater", conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  x
## t = 2.1996, df = 99, p-value = 0.01508
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
##  0.06010828        Inf
## sample estimates:
## mean of x 
## 0.2451972

As before, the automatic t-test command prints out the p-value for the data. The p-value can then be readily compared to the significance level. The null hypothesis is then rejected if the p-value is less than the significance level.

Here’s a way to calculate the p-value:

(p <- 1 - pt(abs(tstat_x), n_x - 1))
## [1] 0.01508072

One-tailed test; less than

tstat_x < -critical_one  # Reject Null Hypothesis if TRUE
## [1] FALSE

Here’s the short-cut method:

t.test(x, mu = mu0, alternative = "less", conf.level = 0.95)
## 
##  One Sample t-test
## 
## data:  x
## t = 2.1996, df = 99, p-value = 0.9849
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
##       -Inf 0.4302861
## sample estimates:
## mean of x 
## 0.2451972

Again, the automatic t-test command prints out the p-value for the data.

Here’s a way to calculate the p-value:

(p <- pt(abs(tstat_x), n_x - 1))
## [1] 0.9849193