t-Test

One Sample t-Test

To perfrom a one sample t-Test, let’s first get the data :

LungCapData = read.table(file = "../Dataset/LungCapData.txt", header = T, sep = "\t")

Let’s see the variables present in our dataset :

names(LungCapData)

## [1] "LungCap"   "Age"       "Height"    "Smoke"     "Gender"    "Caesarean"

class(LungCapData$LungCap)

## [1] "numeric"

To get any additional help on t-test :

help("t.test")
#or,
?t.test

Let’s visualize the data first :

boxplot(LungCapData$LungCap)

Let’s say,

\(H_0: \mu = 8\)
\(H_A: \mu < 8\)

Let’s perform the hypothesis testing

t.test(LungCapData$LungCap, mu=8, alternative = "less", conf.level = 0.95)

## 
##  One Sample t-test
## 
## data:  LungCapData$LungCap
## t = -1.3842, df = 724, p-value = 0.08336
## alternative hypothesis: true mean is less than 8
## 95 percent confidence interval:
##      -Inf 8.025974
## sample estimates:
## mean of x 
##  7.863148

As the p-value is more than \(5\%\) so, we failed to reject the \(H_0\) and hence, the sample mean is less than \(8\).

The default t-test in R is two sided, so, if we not include the argument alternative="less" then, R will perform a two sided hypothesis test.

To see the attributes of the t-Test, we can store the test results in a variable and then pass the attributes function :

Test = t.test(LungCapData$LungCap, mu=8, alternative = "less", conf.level = 0.95)
attributes(Test)

## $names
##  [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
##  [6] "null.value"  "stderr"      "alternative" "method"      "data.name"  
## 
## $class
## [1] "htest"

To only see a specific attribute of the test :

Test$conf.int

## [1]     -Inf 8.025974
## attr(,"conf.level")
## [1] 0.95

Test$p.value

## [1] 0.08335542

Paired T-Test

Let’s first get the data :

BPData = read.table(file = "../Dataset/BloodPressure.txt",header = T, sep = "\t")

Let’s see the data:

names(BPData)

## [1] "Subject" "Before"  "After"

dimesion of the dataset is :

dim(BPData)

## [1] 25  3

Let’s visualize the data :

attach(BPData)
boxplot(Before,After)

Let’s plot a scatter plot between the “Before” and “After” result to see a correlation between them :

plot(Before, After)
abline(a=0, b=1) #slope = 0 and intercept = 1

Let’s perform a two sided hypothesis test on our data :

\(H_0\): Mean difference is zero, i.e., no chnage is SBP before and after
\(H_A\): Mean difference is not zero, i.e., there is a change is SBP

t.test(Before, After, mu=0, alternative = "two.sided", paired = TRUE, conf.level = 0.99)

## 
##  Paired t-test
## 
## data:  Before and After
## t = 3.8882, df = 24, p-value = 0.0006986
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
##   2.245279 13.754721
## sample estimates:
## mean of the differences 
##                       8

As the p-Value is very small, hence we reject the null hypothesis.

If we change the order of “Before” and “After” then, it will only change the sign of mean if the differences, Confidence Interval and t-Statistics.

Independent Two Sample t-Test

This is a parametric method appropriate for examining the difference in means for 2 populations.

This is a way of examining the relationship between a numeric outcome variable (\(Y\)) and a categorical explanatory variable (\(X\), with 2 levels)

We will be examining the relationship between the Smoking Habits and its effect on Lung Capacity.

Let’s see the class of these two variables first.

class(LungCapData$LungCap)

## [1] "numeric"

class(LungCapData$Smoke)

## [1] "factor"

Let’s visualize the data :

attach(LungCapData)
boxplot(LungCap ~ Smoke)

Let’s perform a hypothesis test :

\(H_0\): Mean Lung capcity of smokers and non-smokers are same.
\(H_A\): Mean Lung capacity of smokers and non-smokers are different.

Let’s perform a two sided t-test -:

t.test(LungCap ~ Smoke, mu=0, alt="two.sided", conf=0.95, var.eq=F, Paired=F )

## 
##  Welch Two Sample t-test
## 
## data:  LungCap by Smoke
## t = -3.6498, df = 117.72, p-value = 0.0003927
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.3501778 -0.4003548
## sample estimates:
##  mean in group no mean in group yes 
##          7.770188          8.645455

As the p-Value is small, so we reject the null hypothesis and thus, the mean lung capacity of smokers and non-smokers are different.

By the argument var.eq=F, we considered non-equal variance for the groups.

var(LungCap[Smoke=="yes"])

## [1] 3.545292

var(LungCap[Smoke=="no"])

## [1] 7.431694

We can clearly see that the variance in lung capacity for people who doesn’t smoke is almost double of those who smoke.

Levene’s Test

This test is used to check whether the population variances are equal or, not.

So, let’s perform an hypothesis test on this.

\(H_0\): Population variances are equal
\(H_A\): population variances are not equal

library(car)

## Warning: package 'car' was built under R version 3.6.3

## Loading required package: carData

Let’s now perform the Levene’s Test

leveneTest(LungCap~Smoke)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   1  12.955 0.0003408 ***
##       723                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the above test, we can see :

\(p-value :0.00034 = 0.034\%\)

As the p-Value is very small, so, we reject the null hypothesis and have evidence to believe that the population have unequal variances.