Hypothesis (T-Test) using R

Sakib Shahriar
18th March 2019

Raw Data File

Raw Data This is the Text File that has all the data

Reading Data

LungData <- read.table("C:/Users/skb67/Desktop/LungCapData.txt"
                       ,header=TRUE,sep="\t")
dim(LungData)
[1] 725   6
head(LungData)
  LungCap Age Height Smoke Gender Caesarean
1   6.475   6   62.1    no   male        no
2  10.125  18   74.7   yes female        no
3   9.550  16   69.7    no female       yes
4  11.125  14   71.0    no   male        no
5   4.800   5   56.9    no   male        no
6   6.225  11   58.7    no female        no

Exploratory Data Analysis (Box plot)

boxplot(LungData$LungCap~LungData$Gender,xlab = "Gender", ylab = "Lung Capacity",
        main = "Lung Capacity by Gender", col = c("pink", "grey"))

plot of chunk unnamed-chunk-2

Men in general have slightly greater lung capacity.

Exploratory Data Analysis (Bar Plot)

library(ggplot2)
g <- ggplot(LungData, aes(LungData$Gender, LungData$LungCap))
g+ labs (title= "Lung Capacity by Gender", xlab = "Gender", ylab ="Lung Capacity")+geom_bar(stat = "identity", aes(fill=LungData$Gender))

plot of chunk unnamed-chunk-3

Men in general have slightly greater lung capacity.

One Sample T Test

According to a reputable website, the human mean lung capacity is 7.5 We suggest that the mean lung capacity is infact greater than 7.5 Null Hypothesis (Ho) = 7.5, Alternative Hypothesis (H1) >7.5 Confidence Interval = 95% /0.95, then level of significance (alpha) = 100-95 = 5% or 0.05

#Documentation
#help(t.test)

t.test(LungData$LungCap, mu =7.5 , alternative = "greater", conf.level = 0.95)

    One Sample t-test

data:  LungData$LungCap
t = 3.6732, df = 724, p-value = 0.0001286
alternative hypothesis: true mean is greater than 7.5
95 percent confidence interval:
 7.700322      Inf
sample estimates:
mean of x 
 7.863148 

p- value (0.0001) < alpha (0.05). We reject Null hypothesis

One Sample T Test

According to a reputable website, the human mean lung capacity is 7.5 We suggest that the mean lung capacity is different than 7.5 Null Hypothesis (Ho) = 7.5, Alternative Hypothesis (H1) != 7.5 Confidence Interval = 95% /0.95, then level of significance (alpha) = 100-95 = 5% or 0.05

t.test(LungData$LungCap, mu =7.5 , alternative = "two.sided", conf.level = 0.95)

    One Sample t-test

data:  LungData$LungCap
t = 3.6732, df = 724, p-value = 0.0002572
alternative hypothesis: true mean is not equal to 7.5
95 percent confidence interval:
 7.669052 8.057243
sample estimates:
mean of x 
 7.863148 

p- value (0.00026) < alpha (0.05). We reject Null hypothesis

Smoker vs Non Smoker Lung Capacity

boxplot(LungData$LungCap~LungData$Smoke,xlab = "Smoker?", ylab = "Lung Capacity",
        main = "Lung Capacity by Smoker and Non Smoker", col = c("green", "red"))

plot of chunk unnamed-chunk-6

Smokers in general have slightly greater lung capacity than Non smokers !?

Can smokers yield greater endurance ?

News

Two Sample T Test

Smoking have no effect on lung capacity (Ho) We state that Smoking do affect lung capacity (H1)

t.test(LungData$LungCap~LungData$Smoke, mu =0, alt = "two.sided",conf =0.95, var.eq = F)

    Welch Two Sample t-test

data:  LungData$LungCap by LungData$Smoke
t = -3.6498, df = 117.72, p-value = 0.0003927
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.3501778 -0.4003548
sample estimates:
 mean in group no mean in group yes 
         7.770188          8.645455 

p- value (0.0004) < alpha (0.05). We reject Null hypothesis

Thanks