To perfrom a one sample t-Test, let’s first get the data :
LungCapData = read.table(file = "../Dataset/LungCapData.txt", header = T, sep = "\t")
Let’s see the variables present in our dataset :
names(LungCapData)
## [1] "LungCap" "Age" "Height" "Smoke" "Gender" "Caesarean"
class(LungCapData$LungCap)
## [1] "numeric"
To get any additional help on t-test :
help("t.test")
#or,
?t.test
Let’s visualize the data first :
boxplot(LungCapData$LungCap)
Let’s say,
\(H_0: \mu = 8\)
\(H_A: \mu < 8\)
Let’s perform the hypothesis testing
t.test(LungCapData$LungCap, mu=8, alternative = "less", conf.level = 0.95)
##
## One Sample t-test
##
## data: LungCapData$LungCap
## t = -1.3842, df = 724, p-value = 0.08336
## alternative hypothesis: true mean is less than 8
## 95 percent confidence interval:
## -Inf 8.025974
## sample estimates:
## mean of x
## 7.863148
As the p-value is more than \(5\%\) so, we failed to reject the \(H_0\) and hence, the sample mean is less than \(8\).
The default t-test in R is two sided, so, if we not include the argument alternative="less" then, R will perform a two sided hypothesis test.
To see the attributes of the t-Test, we can store the test results in a variable and then pass the attributes function :
Test = t.test(LungCapData$LungCap, mu=8, alternative = "less", conf.level = 0.95)
attributes(Test)
## $names
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate"
## [6] "null.value" "stderr" "alternative" "method" "data.name"
##
## $class
## [1] "htest"
To only see a specific attribute of the test :
Test$conf.int
## [1] -Inf 8.025974
## attr(,"conf.level")
## [1] 0.95
Test$p.value
## [1] 0.08335542
Let’s first get the data :
BPData = read.table(file = "../Dataset/BloodPressure.txt",header = T, sep = "\t")
Let’s see the data:
names(BPData)
## [1] "Subject" "Before" "After"
dimesion of the dataset is :
dim(BPData)
## [1] 25 3
Let’s visualize the data :
attach(BPData)
boxplot(Before,After)
Let’s plot a scatter plot between the “Before” and “After” result to see a correlation between them :
plot(Before, After)
abline(a=0, b=1) #slope = 0 and intercept = 1
Let’s perform a two sided hypothesis test on our data :
\(H_0\): Mean difference is zero, i.e., no chnage is SBP before and after
\(H_A\): Mean difference is not zero, i.e., there is a change is SBP
t.test(Before, After, mu=0, alternative = "two.sided", paired = TRUE, conf.level = 0.99)
##
## Paired t-test
##
## data: Before and After
## t = 3.8882, df = 24, p-value = 0.0006986
## alternative hypothesis: true difference in means is not equal to 0
## 99 percent confidence interval:
## 2.245279 13.754721
## sample estimates:
## mean of the differences
## 8
As the p-Value is very small, hence we reject the null hypothesis.
If we change the order of “Before” and “After” then, it will only change the sign of mean if the differences, Confidence Interval and t-Statistics.
This is a parametric method appropriate for examining the difference in means for 2 populations.
This is a way of examining the relationship between a numeric outcome variable (\(Y\)) and a categorical explanatory variable (\(X\), with 2 levels)
We will be examining the relationship between the Smoking Habits and its effect on Lung Capacity.
Let’s see the class of these two variables first.
class(LungCapData$LungCap)
## [1] "numeric"
class(LungCapData$Smoke)
## [1] "factor"
Let’s visualize the data :
attach(LungCapData)
boxplot(LungCap ~ Smoke)
Let’s perform a hypothesis test :
\(H_0\): Mean Lung capcity of smokers and non-smokers are same.
\(H_A\): Mean Lung capacity of smokers and non-smokers are different.
Let’s perform a two sided t-test -:
t.test(LungCap ~ Smoke, mu=0, alt="two.sided", conf=0.95, var.eq=F, Paired=F )
##
## Welch Two Sample t-test
##
## data: LungCap by Smoke
## t = -3.6498, df = 117.72, p-value = 0.0003927
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.3501778 -0.4003548
## sample estimates:
## mean in group no mean in group yes
## 7.770188 8.645455
As the p-Value is small, so we reject the null hypothesis and thus, the mean lung capacity of smokers and non-smokers are different.
By the argument var.eq=F, we considered non-equal variance for the groups.
var(LungCap[Smoke=="yes"])
## [1] 3.545292
var(LungCap[Smoke=="no"])
## [1] 7.431694
We can clearly see that the variance in lung capacity for people who doesn’t smoke is almost double of those who smoke.
This test is used to check whether the population variances are equal or, not.
So, let’s perform an hypothesis test on this.
\(H_0\): Population variances are equal
\(H_A\): population variances are not equal
library(car)
## Warning: package 'car' was built under R version 3.6.3
## Loading required package: carData
Let’s now perform the Levene’s Test
leveneTest(LungCap~Smoke)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 12.955 0.0003408 ***
## 723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the above test, we can see :
\(p-value :0.00034 = 0.034\%\)
As the p-Value is very small, so, we reject the null hypothesis and have evidence to believe that the population have unequal variances.