Tooth Growth

Introduction
Loading Datasets
Analysis of Effect on tooth growth due to supp and dose
- Due to supp
- Due to Dose
Assumptions and Conclusions

Introduction

Load the ToothGrowth data and perform some basic exploratory data analyses
Provide a basic summary of the data.
Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose. (Use the techniques from class even if there’s other approaches worth considering)
State your conclusions and the assumptions needed for your conclusions.

Loading Datasets

Loading the desired ToothGrowth dataset

library(datasets)
data(ToothGrowth)

## Looking at the data
summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

boxplot(ToothGrowth$len~ToothGrowth$supp, main = "Distribution of Len with different levels of Supp", xlab = "levels of Supp", ylab = "length of tooth")

boxplot(ToothGrowth$len~ToothGrowth$dose, main = "Distribution of Len with different levels of Dose", xlab = "levels of Dose", ylab = "length of tooth")

Analysis of Effect on tooth growth due to supp and dose

Due to supp

Create a new list by splitting the data based on the supp column in the ToothGrowth Data

## Splitting the data
suppSplit <- split(ToothGrowth, ToothGrowth$supp)

## Visualizing the data where supp = VC
hist(suppSplit$VC$len, main = "Histogram for Tooth Growth len by supp = VC", xlab = "Tooth length")

qqnorm(suppSplit$VC$len); qqline(suppSplit$VC$len, col = 2)

## Calculating the sample statistics
mean(suppSplit$VC$len)

## [1] 16.96333

sd(suppSplit$VC$len)

## [1] 8.266029

## Visualizing the data where supp = OJ
hist(suppSplit$OJ$len, main = "Histogram for Tooth Growth len by supp = OJ", xlab = "Tooth length")

qqnorm(suppSplit$OJ$len); qqline(suppSplit$OJ$len, col = 2)

## Calculating the sample statistics
mean(suppSplit$OJ$len)

## [1] 20.66333

sd(suppSplit$OJ$len)

## [1] 6.605561

Hypothesis test

H0: difference in the mean of len for OJ and VC data is equal to 0

Ha: difference in the mean of len for OJ and VC data is not equal to 0

Evaluating hypothesis testing considering the significance level of 5%

## Evaluating the student t test assuming the variance between the two population is not equal
t.test(suppSplit$VC$len, suppSplit$OJ$len, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  suppSplit$VC$len and suppSplit$OJ$len
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5710156  0.1710156
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

T-statisics comes out to be -1.9153, with a p-value of 0.06063 which is greatere than the significance level of 0.05 hence could not reject the null hypothesis.

Confidence interval for the difference in mean for VC length to mean for OJ length with 95% confidence comes to bein between (-7.5710156 0.1710156)

Similar results can also be checked even if we assume the variance between the groups to be equal

t.test(suppSplit$VC$len, suppSplit$OJ$len, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  suppSplit$VC$len and suppSplit$OJ$len
## t = -1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -7.5670064  0.1670064
## sample estimates:
## mean of x mean of y 
##  16.96333  20.66333

Conclusion

By using a t-test it has be proved that there is not a significance difference in the means of the two group and thus accepting the nullhypothesis that observed difference is just due to a chance and there is no deviation from the null hypothesis

Due to Dose

Create a new list by splitting the data based on the Dose column in the ToothGrowth Data

DoseSplit<-split(ToothGrowth, ToothGrowth$dose)
names(DoseSplit)<-c("half", "one", "two")

We observe the t-test between length where dose is half and where dose is two

## Visualizing the data where dose = 0.5
hist(DoseSplit$half$len, main = "Histogram for Tooth Growth len by dose = 0.5", xlab = "Tooth length")

qqnorm(DoseSplit$half$len); qqline(DoseSplit$half$len, col = 2)

## Calculating the sample statistics
mean(DoseSplit$half$len)

## [1] 10.605

sd(DoseSplit$half$len)

## [1] 4.499763

## Visualizing the data where dose = 0.5
hist(DoseSplit$two$len, main = "Histogram for Tooth Growth len by Dose = 2", xlab = "Tooth length")

qqnorm(DoseSplit$two$len); qqline(DoseSplit$two$len, col = 2)

## Calculating the sample statistics
mean(DoseSplit$two$len)

## [1] 26.1

sd(DoseSplit$two$len)

## [1] 3.77415

Hypothesis test

H0: difference in the mean of len for dose = 0.5 and dose = 2 data is equal to 0

Ha: difference in the mean of len for dose = 0.5 and dose = 2 data is not equal to 0

Evaluating hypothesis testing considering the significance level of 5%

## Evaluating the student t test assuming the variance between the two population is not equal
t.test(DoseSplit$half$len, DoseSplit$two$len, var.equal = FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  DoseSplit$half$len and DoseSplit$two$len
## t = -11.799, df = 36.883, p-value = 4.398e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15617 -12.83383
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

T-statisics comes out to be -11.799, with a p-value of 4.398e-14 which is significantly lessthan the significance level of 0.05 hence we can safely reject null hypothesis

Confidence interval for the difference in mean for length where dose was 0.5 to mean for for length where dose was 2 with 95% confidence comes to be in between (-18.15617, -12.83383)

Similar results can also be checked even if we assume the variance between the groups to be equal

t.test(DoseSplit$half$len, DoseSplit$two$len, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  DoseSplit$half$len and DoseSplit$two$len
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15352 -12.83648
## sample estimates:
## mean of x mean of y 
##    10.605    26.100

Conclusion

Here only comparison between two out of three doses is made since if multiple t-test have to be conducted then the significance level needs to be reduced according to the Bonferroni Correction as a*=a/K where K=k(k-1)/2 and k is the number of levels.

Different techniques like annova could have been used to test the hypothesis H0: Mean len is same across all dose Ha: There is a atleast one pair of mean len difference across all doses

Whereas in the case of t-test it has been proved due to a veruy low p-value of 4.398e-14 it is safe to reject null hypothesis in the favor of alternate hypothesis. Type-1 error associated with the decission is very low

Assumptions and Conclusions

Assumptions encountered may include

Normality: the len for each factor levels should be approximately normally distributed. As observed from the qq-plot, some sets were not much properly normally distributed. So it was assumed to be somewhat normal, so as to conduct the tests
Independence: the samples should be independent with each other within group and as well as inter group

Conclusions drawn from the tests include

mean values of len is similar across both the Supp levels, with the p-value of 0.06063 that is greater than the significance level of 0.05
mean value of ‘len’ is atleast different across the level of dose from 0.5 and 2 with a p-value of 4.398e-14 which is very small

Tooth Growth

Ujjwal Shukla

Monday, October 27, 2014

Introduction

Loading Datasets

Analysis of Effect on tooth growth due to supp and dose

Due to supp

Hypothesis test

Conclusion

Due to Dose

Hypothesis test

Conclusion

Assumptions and Conclusions

Assumptions encountered may include

Conclusions drawn from the tests include