One-sample t-test

This is part of My notes on R programming on my site https://dataz4s.com

Intro to one-sample t-test in R

The t-test is widely applied statistical method. Here a few examples of how the t-test can be run in R with the t.test() function.

Short examples

# Creating a vector
x <- c(5,1,2,3,3,7,8,6,3,8)
mean(x)

## [1] 4.6

The mean is 4.6.

Let’s see how the ‘equal-to’, the ‘less than’ and the ‘greater than’ hypotheses tests can be run with the t.test() function.

# Hypothesis test of the 'equal to'
# H0: mu = 3
t.test(x, alternative="two.sided", mu = 3, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 1.9863, df = 9, p-value = 0.07827
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
##  2.77775 6.42225
## sample estimates:
## mean of x 
##       4.6

At a 0.05 significance level we would fail to reject H0. The test does not provide sufficient evidence that mu different from 3.

# Hypothesis test of the 'greater than'
# H0: mu >= 3
t.test(x, alternative="less", mu = 3, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 1.9863, df = 9, p-value = 0.9609
## alternative hypothesis: true mean is less than 3
## 95 percent confidence interval:
##      -Inf 6.076639
## sample estimates:
## mean of x 
##       4.6

At a 0.05 significance level we would fail to reject H0. The p-value of 0.96 provides very strong proof that mu is greater than 3.

# Hypothesis test of the 'less than'
# H0: mu <= 3
t.test(x, alternative="greater", mu = 3, conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  x
## t = 1.9863, df = 9, p-value = 0.03913
## alternative hypothesis: true mean is greater than 3
## 95 percent confidence interval:
##  3.123361      Inf
## sample estimates:
## mean of x 
##       4.6

At a 0.05 significance level we would reject H0. The test provides evidence that, at a 0.05 significance level, mu is greater than 3.

One-sample test with a larger dataset

We will use the LungCap dataset. Read in data:

# Read in data via read_excel
library(readxl)
LungCapData <- read_excel("C:/Users/Usuario/Documents/dataZ4s/R/MarinLectures/LungCapData.xlsx", 
                          col_types = c("numeric", "numeric", "numeric", 
                                        "text", "text", "text"))
# attach(LungCapData)
attach(LungCapData)

The dataset

The LungCap dataset has

# Viewing the first 6 lines of the dataset
head(LungCapData)

## # A tibble: 6 x 6
##   LungCap   Age Height Smoke Gender Caesarean
##     <dbl> <dbl>  <dbl> <chr> <chr>  <chr>    
## 1    6.48     6   62.1 no    male   no       
## 2   10.1     18   74.7 yes   female no       
## 3    9.55    16   69.7 no    female yes      
## 4   11.1     14   71   no    male   no       
## 5    4.8      5   56.9 no    male   no       
## 6    6.22    11   58.7 no    female no

# Summarized
summary(LungCapData)

##     LungCap            Age            Height         Smoke          
##  Min.   : 0.507   Min.   : 3.00   Min.   :45.30   Length:725        
##  1st Qu.: 6.150   1st Qu.: 9.00   1st Qu.:59.90   Class :character  
##  Median : 8.000   Median :13.00   Median :65.40   Mode  :character  
##  Mean   : 7.863   Mean   :12.33   Mean   :64.84                     
##  3rd Qu.: 9.800   3rd Qu.:15.00   3rd Qu.:70.30                     
##  Max.   :14.675   Max.   :19.00   Max.   :81.80                     
##     Gender           Caesarean        
##  Length:725         Length:725        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##

# Mean lung capacity of the tested persons
mean(LungCap)

## [1] 7.863148

# Mean and median age of the tested
mean(Age)

## [1] 12.3269

median(Age)

## [1] 13

One-taled hypothesis test and confidence interval

# One-tailed hypothesis test for the mean of lung capacity
# H0: mu < 8
# 95% one-tailed confidence interval for mean lung capacity
t.test(LungCap,mu = 8, alternative = "less", conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  LungCap
## t = -1.3842, df = 724, p-value = 0.08336
## alternative hypothesis: true mean is less than 8
## 95 percent confidence interval:
##      -Inf 8.025974
## sample estimates:
## mean of x 
##  7.863148

# Can be shortened
t.test(LungCap,mu = 8, alt = "less", conf = 0.95)

## 
##  One Sample t-test
## 
## data:  LungCap
## t = -1.3842, df = 724, p-value = 0.08336
## alternative hypothesis: true mean is less than 8
## 95 percent confidence interval:
##      -Inf 8.025974
## sample estimates:
## mean of x 
##  7.863148

So, we would fail to reject H0 failing to reject that the mean is less than than 8.

Two-tailed test and confidence interval

# Two-tailed hypothesis test
# H0: mu = 8
# Two-tailed confidence interval
t.test(LungCap,mu = 8, alt = "two.sided", conf = 0.95)

## 
##  One Sample t-test
## 
## data:  LungCap
## t = -1.3842, df = 724, p-value = 0.1667
## alternative hypothesis: true mean is not equal to 8
## 95 percent confidence interval:
##  7.669052 8.057243
## sample estimates:
## mean of x 
##  7.863148

# The 'alt' argument has "two.sided as default
t.test(LungCap, mu = 8, conf = 0.95)

## 
##  One Sample t-test
## 
## data:  LungCap
## t = -1.3842, df = 724, p-value = 0.1667
## alternative hypothesis: true mean is not equal to 8
## 95 percent confidence interval:
##  7.669052 8.057243
## sample estimates:
## mean of x 
##  7.863148

Extracting attributes from a test

For more advanced analysis and programming it can be useful to extract individual attributes

# Saving the test to an object
TEST <- t.test(LungCap, mu = 8, conf = 0.95)

# Checking
TEST

## 
##  One Sample t-test
## 
## data:  LungCap
## t = -1.3842, df = 724, p-value = 0.1667
## alternative hypothesis: true mean is not equal to 8
## 95 percent confidence interval:
##  7.669052 8.057243
## sample estimates:
## mean of x 
##  7.863148

# View attributes of TEST
attributes(TEST)

## $names
##  [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
##  [6] "null.value"  "stderr"      "alternative" "method"      "data.name"  
## 
## $class
## [1] "htest"

# Extracting the test statistic
TEST$statistic

##         t 
## -1.384242

# Extracting the p-value
TEST$p.value

## [1] 0.1667108

Sources for R learnings

Statslectures by Mick Marin provides a short video on one-sample t-test in R which is the inspiration for the t-test on the Lung Capacity dataset.