Let’s generate some fake data and create confidence interval estimates for the population mean.
set.seed(12345)
x <- rnorm(100)
(avg_x <- mean(x))
## [1] 0.2451972
(n_x <- length(x))
## [1] 100
(sd_x <- sd(x))
## [1] 1.114731
(se_x <- sd_x/sqrt(n_x))
## [1] 0.1114731
alpha5 <- 0.05 # Significance
(t5_x <- qt(1-(alpha5/2), n_x - 1)) # t value for 95 percent confidence interval
## [1] 1.984217
(CI5_x <- c( avg_x - t5_x*se_x, avg_x + t5_x*se_x )) # 95 percent Confidence interval
## [1] 0.02401042 0.46638398
(CI5b_x <- avg_x + t5_x * c(-se_x,+se_x)) # Same as above; more concise
## [1] 0.02401042 0.46638398
alpha1 <- 0.01 # Significance
(t1_x <- qt(1-(alpha1/2), n_x - 1)) # t value for 99 percent confidence interval
## [1] 2.626405
(CI1_x <- c( avg_x - t1_x*se_x, avg_x + t1_x*se_x )) # 99 percent Confidence interval
## [1] -0.04757632 0.53797071
One could use the above procedure to generate the confidence interval estimate for \(\mu \equiv \mu_1-\mu_2\) by taking two equal-length samples from populations 1 and 2 — \(x_1\) and \(x_2\) — and defining \(x\equiv x_1-x_2\) for the example above.
We want to test the null hypthesys \(H_0: \mu=\mu_0\) against three alternative hypotheses:
mu0 <- 0 # Hypothesized value of the population mean
(tstat_x <- (avg_x-mu0)/se_x) # t-statistic obtained from the data
## [1] 2.199609
# degrees of freedom = n-1:
(df_x <- n_x - 1)
## [1] 99
Keep in mind that the critical value of the t-statistic would be the same for:
alpha <- 0.05
(critical_two <- qt(1 - (alpha/2), df_x))
## [1] 1.984217
abs(tstat_x) > critical_two # Reject Null Hypothesis if TRUE
## [1] TRUE
Here’s the short-cut method:
t.test(x, mu = mu0, conf.level = 0.95)
##
## One Sample t-test
##
## data: x
## t = 2.1996, df = 99, p-value = 0.03016
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.02401042 0.46638398
## sample estimates:
## mean of x
## 0.2451972
Note that the automatic t-test command prints out the p-value for the data. The p-value can then be readily compared to the significance level. The null hypothesis is then rejected if the p-value is less than the significance level.
Here’s a way to calculate the p-value:
(p <- 2 * (1 - pt(abs(tstat_x), n_x - 1)))
## [1] 0.03016145
Also, note the 95 percent confidence interval generated by the t.test() command. It is the same as the 95 percent confidence interval that was generated laboriously earlier on this page.
The output produced by the t.test() command can be extracted individual component by individual component as follows:
t2tail <- t.test(x, mu = mu0, conf.level = 0.95)
names(t2tail)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "alternative" "method" "data.name"
t2tail$statistic
## t
## 2.199609
t2tail$p.value
## [1] 0.03016145
t2tail$conf.int
## [1] 0.02401042 0.46638398
## attr(,"conf.level")
## [1] 0.95
t2tail$null.value
## mean
## 0
This could be useful in writing reports.
(critical_one <- qt(1 - alpha, df_x))
## [1] 1.660391
tstat_x > critical_one # Reject Null Hypothesis if TRUE
## [1] TRUE
Here’s the short-cut method:
t.test(x, mu = mu0, alternative = "greater", conf.level = 0.95)
##
## One Sample t-test
##
## data: x
## t = 2.1996, df = 99, p-value = 0.01508
## alternative hypothesis: true mean is greater than 0
## 95 percent confidence interval:
## 0.06010828 Inf
## sample estimates:
## mean of x
## 0.2451972
As before, the automatic t-test command prints out the p-value for the data. The p-value can then be readily compared to the significance level. The null hypothesis is then rejected if the p-value is less than the significance level.
Here’s a way to calculate the p-value:
(p <- 1 - pt(abs(tstat_x), n_x - 1))
## [1] 0.01508072
tstat_x < -critical_one # Reject Null Hypothesis if TRUE
## [1] FALSE
Here’s the short-cut method:
t.test(x, mu = mu0, alternative = "less", conf.level = 0.95)
##
## One Sample t-test
##
## data: x
## t = 2.1996, df = 99, p-value = 0.9849
## alternative hypothesis: true mean is less than 0
## 95 percent confidence interval:
## -Inf 0.4302861
## sample estimates:
## mean of x
## 0.2451972
Again, the automatic t-test command prints out the p-value for the data.
Here’s a way to calculate the p-value:
(p <- pt(abs(tstat_x), n_x - 1))
## [1] 0.9849193