Tooth Growth Dataset Analysis

Introduction

The goal of this analysis is to analyze the Tooth Growth dataset from the Data package in R. We must load the data and provide a basic summary. Then we use plotting and confidence interval testing to draw conclusions on the data. Depending on your knowledge of R, the task should be more of a routine dental cleaning than a root canal.

knitr::opts_chunk$set(echo = TRUE)
data("ToothGrowth")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
?ToothGrowth
## starting httpd help server ...
##  done
data <- ToothGrowth
head(data)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

Description

From the help function, we can read some basic info about the data set. The variables appear to be lengths of cells responsible for tooth growth, or odontoblasts, in 60 test guinea pigs, or observations. According to the description, each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, (orange juice or ascorbic acid (a form of vitamin C and coded as VC). Let’s run some initial descriptive statistics to get a better sense of how these units are measured.

str(data)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
summary(data)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
unique(data$supp)
## [1] VC OJ
## Levels: OJ VC
unique(data$dose)
## [1] 0.5 1.0 2.0
any(is.na(data$len))
## [1] FALSE

From these results, we have a basic idea of how the data is organized. We also know that it does not contain any NA values.

Plotting

To better understand how Dose and Supplements affect tooth length, we need to plot the data. First, we look at the smooth plot of both factors.

At first glance We can see OJ may drive some tooth length. A boxplot provides a clearer look at the affect of length.

Again OJ stands out as a clear winner. The clustering of VC toward the bottom of the length scale in quadrant 4 (bottom right). To be sure, we must analyze the dosage patterns as they may be the cause of OJ’s effect

Clearly length increases with dosage. It also appears the spread of the dose by supplement is relatively even. Thus it is safe to make the hypothesis that OJ drives tooth length more than absorbic acid.

Hypothesis Testing

Now that we have made a guess as to what’s going on, we have to test this hypothesis. Running t-tests on the effectiveness of each of factors OJ, VC on mg allow us to do so. We can use the t.test function to perform this step.

vcdata <- data %>%
     filter(supp=="VC") %>%
     select(len, dose)

ojdata <- data %>%
        filter(supp=="OJ") %>%
        select(len, dose)

ojlen_mean <- mean(ojdata$len)
ojlen_sd <- sd(ojdata$len)
vclen_mean <- mean(vcdata$len)
vclen_sd <- sd(vcdata$len)

welch.test <- t.test(ojdata$len, vcdata$len, paired=FALSE, var.equal=FALSE, conf.level=.95)
welch.test
## 
##  Welch Two Sample t-test
## 
## data:  ojdata$len and vcdata$len
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333

The low p-value tells us we cannot reject the null hypothesis, meaning there is no statistical difference between OJ and VC, despite what the graphs showed. To manually test these calculations, we can construct the formulae ourselves.

# t-value
tval <- (ojlen_mean - vclen_mean) / sqrt(ojlen_sd^2/length(ojdata$len)+
        vclen_sd^2/length(vcdata$len))
tval
## [1] 1.915268
# degrees of freedom
dofnumer <- (vclen_sd^2/length(vcdata$len)+
        ojlen_sd^2/length(ojdata$len))^2
dofdenom <- ((vclen_sd^2/length(vcdata$len))^2 / (length(vcdata$len)-1)+
        (ojlen_sd^2/length(ojdata$len))^2 / (length(ojdata$len)-1))
dof <- dofnumer/dofdenom
dof
## [1] 55.30943
# p-value
(1-pt(tval, dof))*2 #multiplied by two because of the test is two-sided.
## [1] 0.06063451

As we see the key variables produced by the manual calculations are the same as those produced using the t.test function.

Results

In any case, we still must ask what drives tooth length? To answer this question we have to break the OJ and VC data down by dose and run t-tests on each subset.

vc_0.5 <- vcdata$len[1:10]
oj_0.5 <- ojdata$len[1:10]
vc_1.0 <- vcdata$len[11:20]
oj_1.0 <- ojdata$len[11:20]
vc_2.0 <- vcdata$len[21:30]
oj_2.0 <- ojdata$len[21:30]

t.test(oj_0.5,vc_0.5, paired=FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  oj_0.5 and vc_0.5
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean of x mean of y 
##     13.23      7.98
t.test(oj_1.0,vc_1.0, paired=FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  oj_1.0 and vc_1.0
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean of x mean of y 
##     22.70     16.77
t.test(oj_2.0,vc_2.0, paired=FALSE, var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  oj_2.0 and vc_2.0
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Finally we have an answer. For a dosage of 0.5, we can reject the null hypothesis. That means that the two supplements do not have equal effects on tooth growth. We can say the same thing about a dosage of 1 mg. For a dosage of 2 mg though, we cannot reject the null hypothesis. Therefore we can safely conclude that dosages of 0.5 and 1 mg will result in greater tooth growth for OJ than absorbic acid. However, a dosage of 2 mg will have a similar impact for either supplement. In the end, more research must be done as to whether a blend of either supplement or other size dosages will impact tooth length any differently. For dental experts and those concerned with tooth length, that is certainly an impactful dataset.