Introduction

Loading Datasets

Loading the desired ToothGrowth dataset

library(datasets)
data(ToothGrowth)

## Looking at the data
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
boxplot(ToothGrowth$len~ToothGrowth$supp, main = "Distribution of Len with different levels of Supp", xlab = "levels of Supp", ylab = "length of tooth")

boxplot(ToothGrowth$len~ToothGrowth$dose, main = "Distribution of Len with different levels of Dose", xlab = "levels of Dose", ylab = "length of tooth")

Analysis of Effect on tooth growth due to supp and dose

Due to supp

Create a new list by splitting the data based on the supp column in the ToothGrowth Data

## Splitting the data
suppSplit <- split(ToothGrowth, ToothGrowth$supp)

## Visualizing the data where supp = VC
hist(suppSplit$VC$len, main = "Histogram for Tooth Growth len by supp = VC", xlab = "Tooth length")

qqnorm(suppSplit$VC$len); qqline(suppSplit$VC$len, col = 2)

## Calculating the sample statistics
mean(suppSplit$VC$len)
## [1] 16.96333
sd(suppSplit$VC$len)
## [1] 8.266029
## Visualizing the data where supp = OJ
hist(suppSplit$OJ$len, main = "Histogram for Tooth Growth len by supp = OJ", xlab = "Tooth length")

qqnorm(suppSplit$OJ$len); qqline(suppSplit$OJ$len, col = 2)

## Calculating the sample statistics
mean(suppSplit$OJ$len)
## [1] 20.66333
sd(suppSplit$OJ$len)
## [1] 6.605561
Hypothesis test

H0: difference in the mean of len for OJ and VC data is equal to 0

Ha: difference in the mean of len for OJ and VC data is not equal to 0

Evaluating hypothesis testing considering the significance level of 5%

## Evaluating the student t test assuming the variance between the two population is not equal
t.test(suppSplit$VC$len, suppSplit$OJ$len, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  suppSplit$VC$len and suppSplit$OJ$len
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5710156  0.1710156
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

T-statisics comes out to be -1.9153, with a p-value of 0.06063 which is greatere than the significance level of 0.05 hence could not reject the null hypothesis.

Confidence interval for the difference in mean for VC length to mean for OJ length with 95% confidence comes to bein between (-7.5710156 0.1710156)

Similar results can also be checked even if we assume the variance between the groups to be equal

t.test(suppSplit$VC$len, suppSplit$OJ$len, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  suppSplit$VC$len and suppSplit$OJ$len
## t = -1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5670064  0.1670064
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333
Conclusion

By using a t-test it has be proved that there is not a significance difference in the means of the two group and thus accepting the nullhypothesis that observed difference is just due to a chance and there is no deviation from the null hypothesis

Due to Dose

Create a new list by splitting the data based on the Dose column in the ToothGrowth Data

DoseSplit<-split(ToothGrowth, ToothGrowth$dose)
names(DoseSplit)<-c("half", "one", "two")

We observe the t-test between length where dose is half and where dose is two

## Visualizing the data where dose = 0.5
hist(DoseSplit$half$len, main = "Histogram for Tooth Growth len by dose = 0.5", xlab = "Tooth length")

qqnorm(DoseSplit$half$len); qqline(DoseSplit$half$len, col = 2)

## Calculating the sample statistics
mean(DoseSplit$half$len)
## [1] 10.605
sd(DoseSplit$half$len)
## [1] 4.499763
## Visualizing the data where dose = 0.5
hist(DoseSplit$two$len, main = "Histogram for Tooth Growth len by Dose = 2", xlab = "Tooth length")

qqnorm(DoseSplit$two$len); qqline(DoseSplit$two$len, col = 2)

## Calculating the sample statistics
mean(DoseSplit$two$len)
## [1] 26.1
sd(DoseSplit$two$len)
## [1] 3.77415
Hypothesis test

H0: difference in the mean of len for dose = 0.5 and dose = 2 data is equal to 0

Ha: difference in the mean of len for dose = 0.5 and dose = 2 data is not equal to 0

Evaluating hypothesis testing considering the significance level of 5%

## Evaluating the student t test assuming the variance between the two population is not equal
t.test(DoseSplit$half$len, DoseSplit$two$len, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  DoseSplit$half$len and DoseSplit$two$len
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

T-statisics comes out to be -11.799, with a p-value of 4.398e-14 which is significantly lessthan the significance level of 0.05 hence we can safely reject null hypothesis

Confidence interval for the difference in mean for length where dose was 0.5 to mean for for length where dose was 2 with 95% confidence comes to be in between (-18.15617, -12.83383)

Similar results can also be checked even if we assume the variance between the groups to be equal

t.test(DoseSplit$half$len, DoseSplit$two$len, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  DoseSplit$half$len and DoseSplit$two$len
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15352 -12.83648
## sample estimates:
## mean of x mean of y 
##    10.605    26.100
Conclusion

Here only comparison between two out of three doses is made since if multiple t-test have to be conducted then the significance level needs to be reduced according to the Bonferroni Correction as a*=a/K where K=k(k-1)/2 and k is the number of levels.

Different techniques like annova could have been used to test the hypothesis H0: Mean len is same across all dose Ha: There is a atleast one pair of mean len difference across all doses

Whereas in the case of t-test it has been proved due to a veruy low p-value of 4.398e-14 it is safe to reject null hypothesis in the favor of alternate hypothesis. Type-1 error associated with the decission is very low

Assumptions and Conclusions

Assumptions encountered may include
  • Normality: the len for each factor levels should be approximately normally distributed. As observed from the qq-plot, some sets were not much properly normally distributed. So it was assumed to be somewhat normal, so as to conduct the tests
  • Independence: the samples should be independent with each other within group and as well as inter group
Conclusions drawn from the tests include
  • mean values of len is similar across both the Supp levels, with the p-value of 0.06063 that is greater than the significance level of 0.05
  • mean value of ‘len’ is atleast different across the level of dose from 0.5 and 2 with a p-value of 4.398e-14 which is very small