Statistical Inference - Project Part 2

In this part of the project, we are going to analyze the ToothGrowth data in the R datasets package.

1.Load the ToothGrowth data and perform some basic exploratory data analyses

library(datasets)
library(ggplot2)
library(plyr)

## Warning: package 'plyr' was built under R version 3.1.1

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

g <- ggplot(ToothGrowth,aes(len,dose))
plot(g+geom_point()+facet_grid(.~supp)+geom_smooth(method="lm")+ggtitle("Distibution of tooth lengths across supplement type"))

plot of chunk unnamed-chunk-1

p <- ggplot(ToothGrowth,aes(supp,len))
plot(p+geom_point()+facet_grid(.~dose)+ggtitle("Distribution of tooth lengths across dosage levels"))

plot of chunk unnamed-chunk-1

summary <- ddply(ToothGrowth,.(supp,dose),summarize,min_len=min(len),mean_len=mean(len),median_len=median(len),max_len=max(len),sd_len=round(sd(len),2),count=length(len))

g1 <- ggplot(summary,aes(dose,mean_len,color=supp))+geom_point(size=4)+geom_line()+ggtitle("Comparison of mean tooth lengths with supplement type and dosage")
plot(g1)

plot of chunk unnamed-chunk-1

p1 <- ggplot(summary,aes(supp,mean_len))+geom_boxplot(aes(fill=supp))+ggtitle("Mean tooth length based on supplement type")
plot(p1)

plot of chunk unnamed-chunk-1

p2 <- ggplot(summary,aes(factor(dose),mean_len))+geom_boxplot(aes(fill=factor(dose)))+ggtitle("Mean Tooth length in relation to dosage")
plot(p2)

plot of chunk unnamed-chunk-1

2.Provide a basic summary of the data.

str(ToothGrowth)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

summary(ToothGrowth)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

#Displaying the dataframe summary that we created earlier summarizing the ToothGrowth dataset
summary

##   supp dose min_len mean_len median_len max_len sd_len count
## 1   OJ  0.5     8.2    13.23      12.25    21.5   4.46    10
## 2   OJ  1.0    14.5    22.70      23.45    27.3   3.91    10
## 3   OJ  2.0    22.4    26.06      25.95    30.9   2.66    10
## 4   VC  0.5     4.2     7.98       7.15    11.5   2.75    10
## 5   VC  1.0    13.6    16.77      16.50    22.5   2.52    10
## 6   VC  2.0    18.5    26.14      25.95    33.9   4.80    10

3. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there's other approaches worth considering)

Hypothesis test comparing tooth growth to supplement type

t.test(len~supp,paired=FALSE,var.equal=TRUE,data=ToothGrowth)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 58, p-value = 0.06039
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1670064  7.5670064
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

t.test(len~supp,paired=FALSE,var.equal=FALSE,data=ToothGrowth)

## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

Confidence intervals and p-values are calculated using the t.test function setting constant variance to TRUE and FALSE. In both cases, the p-value is above 5%. The t-confidence interval contains 0. This means that we fail to reject the null hypothesis. We cannot clearly state that the supplement type affects the tooth length.

Hypothesis test comparing tooth growth to dosage

diffdose1 <- subset(ToothGrowth,dose %in% c(0.5,1.0))
diffdose2 <- subset(ToothGrowth,dose %in% c(1.0,2.0))
diffdose3 <- subset(ToothGrowth,dose %in% c(0.5,2.0))

t.test(len~dose,paired=FALSE,var.equal=TRUE,data=diffdose1)

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -6.4766, df = 38, p-value = 1.266e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.983748  -6.276252
## sample estimates:
## mean in group 0.5   mean in group 1 
##            10.605            19.735

t.test(len~dose,paired=FALSE,var.equal=TRUE,data=diffdose2)

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -4.9005, df = 38, p-value = 1.811e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.994387 -3.735613
## sample estimates:
## mean in group 1 mean in group 2 
##          19.735          26.100

t.test(len~dose,paired=FALSE,var.equal=TRUE,data=diffdose3)

## 
##  Two Sample t-test
## 
## data:  len by dose
## t = -11.799, df = 38, p-value = 2.838e-14
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -18.15352 -12.83648
## sample estimates:
## mean in group 0.5   mean in group 2 
##            10.605            26.100

Comparing dosage of 0.5 to 1.0 The 95% confidence interval (-11.983748,-6.276252) is entirely below 0 and the p-value = 1.266e-07 is much smaller than 5%. This means that the null hypothesis is rejected. Difference in dosage from 0.5 to 1.0 has a positive impact on tooth growth.

Comparing dosage of 1.0 to 2.0 The 95% confidence interval (-8.994387,-3.735613) is entirely below 0 and the p- value=1.811e-05 is much smaller than 5%. This means that the null hypothesis is rejected. Difference in dosage from 1.0 to 2.0 has a positive impact on tooth growth.

Comparing dosage of 0.5 to 2.0 The 95% confidence interval (-18.15352,-12.83648) is entirely below 0 and the p-value = 2.838e-14 is much smaller than 5%. This means that the null hypothesis is rejected. Difference in dosage from 0.5 to 2.0 has a positive impact on tooth growth.

Based on the summary of the dataset and the initial exploratory data analysis, it also seems worthwhile to explore the impact of the supplement type given the dosage. At lower dosages, there seems to be a significant difference in tooth growth length between the two the supplements.

summary

##   supp dose min_len mean_len median_len max_len sd_len count
## 1   OJ  0.5     8.2    13.23      12.25    21.5   4.46    10
## 2   OJ  1.0    14.5    22.70      23.45    27.3   3.91    10
## 3   OJ  2.0    22.4    26.06      25.95    30.9   2.66    10
## 4   VC  0.5     4.2     7.98       7.15    11.5   2.75    10
## 5   VC  1.0    13.6    16.77      16.50    22.5   2.52    10
## 6   VC  2.0    18.5    26.14      25.95    33.9   4.80    10

dose05 <- ToothGrowth[ToothGrowth$dose==0.5,]
dose10 <- ToothGrowth[ToothGrowth$dose==1.0,]
dose20 <- ToothGrowth[ToothGrowth$dose==2.0,]

t.test(len~supp,paired=FALSE,var.equal=TRUE,data=dose05)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 18, p-value = 0.005304
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.770262 8.729738
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98

t.test(len~supp,paired=FALSE,var.equal=TRUE,data=dose10)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 18, p-value = 0.0007807
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.840692 9.019308
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77

t.test(len~supp,paired=FALSE,var.equal=TRUE,data=dose20)

## 
##  Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 18, p-value = 0.9637
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.722999  3.562999
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

At dose 0.5 The 95% confidence interval (1.770262,8.729738) is entirely above 0 and the p-value = 0.005304 is much smaller than 5%. This means that the null hypothesis is rejected. Type of supplement at a dose of 0.5 has an impact on tooth growth. OJ performs better that VC

At dose 1.0 The 95% confidence interval (2.840692,9.019308) is entirely above 0 and the p-value = 0.0007807 is much smaller than 5%. This means that the null hypothesis is rejected. Type of supplement at a dose of 1.0 has an impact on tooth growth. OJ performs better than VC.

At dose 2.0 The 95% confidence interval (-3.72299,3.562999) contains 0 and the p-value = 0.9637 is bigger than 5%. This means that we fail to reject the null hypothesis. Type of supplement at a dose of 2.0 does not have an impact on tooth growth.

Conclusions & Assumptions

Assumptions

We assumed that the guinea pigs used in this experiment are randomly selected.
We assumed that there were no other confounding factors affecting the growth of tooth.
The variance is considered to be constant in the above hypothesis tests. The tests were also done with unequal variance, but the difference between the two were negligible and in interest of the length of the report, only constant variance was considered.

Conclusions

We are able to conclude that dosage levels have a significant effect on the length of the tooth growth. Increasing the dosage led to an increase in tooth growth. The dosage type did not have a clear impact on the length of the tooth, but when considered specifically in the context of each dosage level, we can conclude that at lower dosage levels of 0.5 and 1.0, the OJ supplement performs better than VC. For a dose of 2.0, there is no significant difference between the two types of supplement.