This report analyzes the ToothGrowth dataset.

Analysis

Load the ToothGrowth data and perform some basic exploratory data analyses.

The below code loads the ToothGrowth data.

library(datasets)
data(ToothGrowth)

The below code plots boxplots for the data set using ggplot.

library(ggplot2)
ggplot(ToothGrowth, aes(x=factor(dose), y=len, fill=factor(dose)))+geom_boxplot()+facet_grid(.~supp)+ggtitle("Analyzing ToothGrowth data")

plot of chunk analysis

This shows that longer teeth tend to use a higher dose.

Provide a basic summary of the data.

The code below provides a summary of the data set. It shows 2 supplements OJ and VC and 3 doses 0.5, 1, and 2 for in the dataset.

summary(ToothGrowth)
##       len       supp         dose     
##  Min.   : 4.2   OJ:30   Min.   :0.50  
##  1st Qu.:13.1   VC:30   1st Qu.:0.50  
##  Median :19.2           Median :1.00  
##  Mean   :18.8           Mean   :1.17  
##  3rd Qu.:25.3           3rd Qu.:2.00  
##  Max.   :33.9           Max.   :2.00

Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose.

Calculating confidence intervals

Confidence intervals use the formula yBar -xBar + c(-1,1)tsqrt(xVar/30 + yVar/30), where 30 is the number of rows that are being taken at a time, yBar is the mean of the second half, xBar is the mean of the first half, t is the quantile, xVar is the variance of the first half, and yVar is the variance of the second half.

xBar<-mean(ToothGrowth$len[1:30])
yBar<-mean(ToothGrowth$len[31:60])
xVar<-(sd(ToothGrowth$len[1:30]))^2
yVar<-(sd(ToothGrowth$len[31:60]))^2
q<-(((xVar+yVar)/30)^2)/((((xVar/30)^2)+((yVar/30)^2))/29)
t<-qt(0.975, q)
yBar -xBar + c(-1,1)*t*sqrt(xVar/30 + yVar/30)
## [1] -0.171  7.571
Performing hypothesis tests
t.test(len~supp, data=ToothGrowth, paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.915, df = 55.31, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.171  7.571
## sample estimates:
## mean in group OJ mean in group VC 
##            20.66            16.96

The below code splits the data set into 3 datasets, one for each of the doses. The hypothesis test is then performed on all 3 data sets (dose values 0.5, 1.0 and 2.0).

a<-subset(ToothGrowth, dose==0.5)
b<-subset(ToothGrowth, dose==1.0)
c<-subset(ToothGrowth, dose==2.0)
t.test(len~supp, data=a, paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.17, df = 14.97, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719 8.781
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len~supp, data=b, paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.033, df = 15.36, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802 9.058
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len~supp, data=c, paired=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.798  3.638
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

State your conclusions and the assumptions needed for your conclusions.

Through the boxplot, it can be concluded that as tooth size increases, the doses tend to be higher. The confidence interval is (-0.171, 7.571). The hypothesis test has been performed taking paired as FALSE.