Independent two-sample t-test in R

This page is part of my R training, which I publish on my site http://dataZ4s.com

The independent 2-sample t-test is a parametric method used for exploring the difference in means for two populations.

Checking for equal variances

We will be using the Lung Capacity dataset with 725 observation and 6 variables comparing the lung capacity of smokers and to the lung capacity of non-smokers.

Read in data:

# Read in data via read_excel
library(readxl)
LungCapData <- read_excel("C:/Users/Usuario/Documents/dataZ4s/R/MarinLectures/LungCapData.xlsx", 
                          col_types = c("numeric", "numeric", "numeric", 
                                        "text", "text", "text"))
# attach(LungCapData)
attach(LungCapData)

Start by visualizing

To get an initial overall idea of the spread of data we can visualize data through a boxplot:

boxplot(LungCap~Smoke)

Levene’s test

It could look as if the spread, the variance, in the non-smokers is greater than the variance for smokers. Let’s check this through the Levene’s test using the CAR package:

(did not manage to knit “leveveTest” from RMarkdown to HTML. It runs fine in the console and as r-code and returns a p-value of 0.0003408)

With a p-value of 0.0003408 we would reject the null hypothesis that the two variances should be equal as we have evidence to beleive that the population variances are not equal. With that we will use the non-equal assumption as we will run the t-test below.

The point estimates for the two population variances:

var(LungCap[Smoke=="yes"])

## [1] 3.545292

var(LungCap[Smoke=="no"])

## [1] 7.431694

Test

Testing if the mean in lung capacity of non-smokers and smokers can be the same. This leads to a two-tailed test and we will assume not equal variances as tested above with the Levene’s test.

# H0: Mean lung capacity of non-smokers = lung cap of smokers
# Two-tailed hypothesis test
# Assuming non-equal variances

t.test(LungCap~Smoke, mu=0, alternative = "two.sided", conf.level = 0.95, var.equal = FALSE, paired = F)

## 
##  Welch Two Sample t-test
## 
## data:  LungCap by Smoke
## t = -3.6498, df = 117.72, p-value = 0.0003927
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.3501778 -0.4003548
## sample estimates:
##  mean in group no mean in group yes 
##          7.770188          8.645455

# The arguments mu=0, alt="two.sided", conf=0.95, var.eq=FALSE, paired=F are default
# The test can therefore be written in short:

t.test(LungCap~Smoke)

## 
##  Welch Two Sample t-test
## 
## data:  LungCap by Smoke
## t = -3.6498, df = 117.72, p-value = 0.0003927
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.3501778 -0.4003548
## sample estimates:
##  mean in group no mean in group yes 
##          7.770188          8.645455