This page is part of my R training, which I publish on my site http://dataZ4s.com
The independent 2-sample t-test is a parametric method used for exploring the difference in means for two populations.
We will be using the Lung Capacity dataset with 725 observation and 6 variables comparing the lung capacity of smokers and to the lung capacity of non-smokers.
# Read in data via read_excel
library(readxl)
LungCapData <- read_excel("C:/Users/Usuario/Documents/dataZ4s/R/MarinLectures/LungCapData.xlsx",
col_types = c("numeric", "numeric", "numeric",
"text", "text", "text"))
# attach(LungCapData)
attach(LungCapData)
To get an initial overall idea of the spread of data we can visualize data through a boxplot:
boxplot(LungCap~Smoke)
It could look as if the spread, the variance, in the non-smokers is greater than the variance for smokers. Let’s check this through the Levene’s test using the CAR package:
(did not manage to knit “leveveTest” from RMarkdown to HTML. It runs fine in the console and as r-code and returns a p-value of 0.0003408)
With a p-value of 0.0003408 we would reject the null hypothesis that the two variances should be equal as we have evidence to beleive that the population variances are not equal. With that we will use the non-equal assumption as we will run the t-test below.
The point estimates for the two population variances:
var(LungCap[Smoke=="yes"])
## [1] 3.545292
var(LungCap[Smoke=="no"])
## [1] 7.431694
Testing if the mean in lung capacity of non-smokers and smokers can be the same. This leads to a two-tailed test and we will assume not equal variances as tested above with the Levene’s test.
# H0: Mean lung capacity of non-smokers = lung cap of smokers
# Two-tailed hypothesis test
# Assuming non-equal variances
t.test(LungCap~Smoke, mu=0, alternative = "two.sided", conf.level = 0.95, var.equal = FALSE, paired = F)
##
## Welch Two Sample t-test
##
## data: LungCap by Smoke
## t = -3.6498, df = 117.72, p-value = 0.0003927
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3501778 -0.4003548
## sample estimates:
## mean in group no mean in group yes
## 7.770188 8.645455
# The arguments mu=0, alt="two.sided", conf=0.95, var.eq=FALSE, paired=F are default
# The test can therefore be written in short:
t.test(LungCap~Smoke)
##
## Welch Two Sample t-test
##
## data: LungCap by Smoke
## t = -3.6498, df = 117.72, p-value = 0.0003927
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3501778 -0.4003548
## sample estimates:
## mean in group no mean in group yes
## 7.770188 8.645455