ToothGrowth Data Analysis

author: Daria Alekseeva

In this report I present data analysis on ToothGrowth dataset from R library.

About dataset

Description

The response is the length of odontoblasts (teeth) in each of 10 guinea pigs at each of three dose levels of Vitamin C (0.5, 1, and 2 mg) with each of two delivery methods (orange juice or ascorbic acid).

Usage

ToothGrowth Format

A data frame with 60 observations on 3 variables.

[,1] len     numeric     Tooth length
[,2] supp    factor      Supplement type (VC or OJ).
[,3] dose    numeric     Dose in milligrams.

Load the ToothGrowth data and perform some basic exploratory data analyses

# load data
library(datasets)
data(ToothGrowth)
head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Provide a basic summary of the data

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
library(ggplot2)
ggplot(ToothGrowth, aes(x=factor(dose), y=len))+geom_boxplot()+facet_grid(~supp)+ggtitle("Analyzing ToothGrowth data")

On the plot we can see that teeth are longer with higher dose.

Use confidence intervals and hypothesis tests to compare tooth growth by supp and dose

By supplement

t.test(len ~ supp, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333

By dose

Let’s create 3 sets for each dose.

a<-subset(ToothGrowth, dose==0.5)
b<-subset(ToothGrowth, dose==1.0)
c<-subset(ToothGrowth, dose==2.0)

Now let’s run hypothesis test on each of them.

t.test(len ~ supp, data=a, paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len ~ supp, data=b, paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len ~ supp, data=c, paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.0461, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

State your conclusions and the assumptions needed for your conclusions

Null hypothesis #1: there is no difference on tooth length across OJ and VC.

Null hypothesis #2: there is no difference on tooth length with dose change.

The true mean has a probability of 95% of being in the interval between -0.17 and 7.57 assuming that the original random variable is normally distributed, and the samples are independent.

T-value is 1.91, p-value is 0.06, confidence interval contains zero so we fail to reject the null hypothesis #1. In other words, there is no effect from VC or OJ treatment itself.

Making conclusion about different doses we can say that for dose 0.5 and 1.0 there is a significant difference in means of VC and OJ groups is large. So we reject null hypothesis #2. With dose 2.0 it didn’t happen, mean differende in very low. We fail to reject null hypothesis #2.