Overview

The purpose of this analysis is to analyse tooth growth by supp and dose. First we will take a brief look at the data, perform a couple tests and finally offer conclusions.

Section 1 - Load and Explore the data

#install.packages("gmodels")
library(ggplot2)
library(knitr)
library(gmodels)
library(dplyr)

data("ToothGrowth")
data <- as.data.frame(ToothGrowth)

str(data)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
table (data$dose)
## 
## 0.5   1   2 
##  20  20  20
data$dose <- factor(data$dose)

So far we have discovered what the data looks like. It seem to be a small dataset. There are only 60 observations and 3 variables. The supp and dose variables seem to be evenly distributed factors in the dataset and the len variable is a measurement that corresponds to those factors. After reviewing the dose column, I changed it to a factor with 3 levels.

Section 2 - Summary of Data

summary(data)
##       len        supp     dose   
##  Min.   : 4.20   OJ:30   0.5:20  
##  1st Qu.:13.07   VC:30   1  :20  
##  Median :19.25           2  :20  
##  Mean   :18.81                   
##  3rd Qu.:25.27                   
##  Max.   :33.90

The summary confirms that the supp and dose can be treated as factors. We can see that len ranges from 4.20 to 33.90. Below a cross tabulation is done which shows the distribution of dose and supp.

CrossTable(data$supp, data$dose)
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  60 
## 
##  
##              | data$dose 
##    data$supp |       0.5 |         1 |         2 | Row Total | 
## -------------|-----------|-----------|-----------|-----------|
##           OJ |        10 |        10 |        10 |        30 | 
##              |     0.000 |     0.000 |     0.000 |           | 
##              |     0.333 |     0.333 |     0.333 |     0.500 | 
##              |     0.500 |     0.500 |     0.500 |           | 
##              |     0.167 |     0.167 |     0.167 |           | 
## -------------|-----------|-----------|-----------|-----------|
##           VC |        10 |        10 |        10 |        30 | 
##              |     0.000 |     0.000 |     0.000 |           | 
##              |     0.333 |     0.333 |     0.333 |     0.500 | 
##              |     0.500 |     0.500 |     0.500 |           | 
##              |     0.167 |     0.167 |     0.167 |           | 
## -------------|-----------|-----------|-----------|-----------|
## Column Total |        20 |        20 |        20 |        60 | 
##              |     0.333 |     0.333 |     0.333 |           | 
## -------------|-----------|-----------|-----------|-----------|
## 
## 

Section 3 - Tests and Confidence Intervals

H0 - There is no difference

doseA <- subset(data, dose %in% c(0.5, 1.0))
doseB <- subset(data, dose %in% c(0.5, 2.0))
doseC <- subset(data, dose %in% c(1.0, 2.0))

test_supp <- t.test(len ~ supp, paired = F, var.equal=F, data = data)
test_doseA <- t.test(len ~ dose, paired = F, var.equal=F, data = doseA)
test_doseB <- t.test(len ~ dose, paired = F, var.equal=F, data = doseB)
test_doseC <- t.test(len ~ dose, paired = F, var.equal=F, data = doseC)

kable(data.frame("p-value" = c(test_supp$p.value, test_doseA$p.value, test_doseB$p.value, test_supp$p.value),
  "Lower Limit" = c(test_supp$conf[1], test_doseA$conf[1], test_doseB$conf[1], test_doseC$conf[1]),
  "Upper Limit" = c(test_supp$conf[2], test_doseA$conf[2], test_doseB$conf[2], test_doseC$conf[2]),
  row.names = c("supp", "dose 0.5 & 1.0", "dose 0.5 & 2.0", "dose 1.0 & 2.0")
))
p.value Lower.Limit Upper.Limit
supp 0.0606345 -0.1710156 7.571016
dose 0.5 & 1.0 0.0000001 -11.9837813 -6.276219
dose 0.5 & 2.0 0.0000000 -18.1561665 -12.833834
dose 1.0 & 2.0 0.0606345 -8.9964805 -3.733519

Section 4 - Conclusion

In the tests in section 3, we assumed that the variance is not equal, there was also no indication of the data being paired. Based on the results above, we will reject the null hypothesis. There seem to be sufficient evidence to believe that the supp and dose makes a difference in teeth length.