The purpose of this analysis is to analyse tooth growth by supp and dose. First we will take a brief look at the data, perform a couple tests and finally offer conclusions.
#install.packages("gmodels")
library(ggplot2)
library(knitr)
library(gmodels)
library(dplyr)
data("ToothGrowth")
data <- as.data.frame(ToothGrowth)
str(data)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
table (data$dose)
##
## 0.5 1 2
## 20 20 20
data$dose <- factor(data$dose)
So far we have discovered what the data looks like. It seem to be a small dataset. There are only 60 observations and 3 variables. The supp and dose variables seem to be evenly distributed factors in the dataset and the len variable is a measurement that corresponds to those factors. After reviewing the dose column, I changed it to a factor with 3 levels.
summary(data)
## len supp dose
## Min. : 4.20 OJ:30 0.5:20
## 1st Qu.:13.07 VC:30 1 :20
## Median :19.25 2 :20
## Mean :18.81
## 3rd Qu.:25.27
## Max. :33.90
The summary confirms that the supp and dose can be treated as factors. We can see that len ranges from 4.20 to 33.90. Below a cross tabulation is done which shows the distribution of dose and supp.
CrossTable(data$supp, data$dose)
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 60
##
##
## | data$dose
## data$supp | 0.5 | 1 | 2 | Row Total |
## -------------|-----------|-----------|-----------|-----------|
## OJ | 10 | 10 | 10 | 30 |
## | 0.000 | 0.000 | 0.000 | |
## | 0.333 | 0.333 | 0.333 | 0.500 |
## | 0.500 | 0.500 | 0.500 | |
## | 0.167 | 0.167 | 0.167 | |
## -------------|-----------|-----------|-----------|-----------|
## VC | 10 | 10 | 10 | 30 |
## | 0.000 | 0.000 | 0.000 | |
## | 0.333 | 0.333 | 0.333 | 0.500 |
## | 0.500 | 0.500 | 0.500 | |
## | 0.167 | 0.167 | 0.167 | |
## -------------|-----------|-----------|-----------|-----------|
## Column Total | 20 | 20 | 20 | 60 |
## | 0.333 | 0.333 | 0.333 | |
## -------------|-----------|-----------|-----------|-----------|
##
##
H0 - There is no difference
doseA <- subset(data, dose %in% c(0.5, 1.0))
doseB <- subset(data, dose %in% c(0.5, 2.0))
doseC <- subset(data, dose %in% c(1.0, 2.0))
test_supp <- t.test(len ~ supp, paired = F, var.equal=F, data = data)
test_doseA <- t.test(len ~ dose, paired = F, var.equal=F, data = doseA)
test_doseB <- t.test(len ~ dose, paired = F, var.equal=F, data = doseB)
test_doseC <- t.test(len ~ dose, paired = F, var.equal=F, data = doseC)
kable(data.frame("p-value" = c(test_supp$p.value, test_doseA$p.value, test_doseB$p.value, test_supp$p.value),
"Lower Limit" = c(test_supp$conf[1], test_doseA$conf[1], test_doseB$conf[1], test_doseC$conf[1]),
"Upper Limit" = c(test_supp$conf[2], test_doseA$conf[2], test_doseB$conf[2], test_doseC$conf[2]),
row.names = c("supp", "dose 0.5 & 1.0", "dose 0.5 & 2.0", "dose 1.0 & 2.0")
))
| p.value | Lower.Limit | Upper.Limit | |
|---|---|---|---|
| supp | 0.0606345 | -0.1710156 | 7.571016 |
| dose 0.5 & 1.0 | 0.0000001 | -11.9837813 | -6.276219 |
| dose 0.5 & 2.0 | 0.0000000 | -18.1561665 | -12.833834 |
| dose 1.0 & 2.0 | 0.0606345 | -8.9964805 | -3.733519 |
In the tests in section 3, we assumed that the variance is not equal, there was also no indication of the data being paired. Based on the results above, we will reject the null hypothesis. There seem to be sufficient evidence to believe that the supp and dose makes a difference in teeth length.