A Data Analsys of the ToothGrowth data

Basic Inferential Data Analysis

data(ToothGrowth)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.2
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
qplot(dose, len, data = ToothGrowth, color = supp, geom = "point") + 
  geom_smooth(method = "lm") +
  labs(title = "ToothGrowth") +
  labs(x = "Dosage", y = "Growth")
## `geom_smooth()` using formula 'y ~ x'

The length of teeth goes up as the dose of supplements increases, OJ seems to incur a higher increase of teeth growth than VC initiallym but with increased dosage, VC gains are comparable.

summary

summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

Plotting boxplots of our variables

ggplot(data = ToothGrowth, aes(x = supp, y = len))+
  geom_boxplot(aes(fill = supp)) +
  facet_wrap(~ dose) +
  labs(title = "Boxplots of ToothGrowth by Supplement and Dose", x = "Supplement", y = "ToothGrowth")

Again the OJ supplement seems to initially have a greater effect on tooth length than VJ but with VJ gaining in larger doses

smDose <- subset (ToothGrowth, dose == 0.5)
medDose <- subset (ToothGrowth, dose == 1)
lgDose <- subset (ToothGrowth, dose == 2)
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = ToothGrowth)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  -0.1710156  7.5710156
## sample estimates:
## mean in group OJ mean in group VC 
##         20.66333         16.96333
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = smDose)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  1.719057 8.780943
## sample estimates:
## mean in group OJ mean in group VC 
##            13.23             7.98
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = medDose)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  2.802148 9.057852
## sample estimates:
## mean in group OJ mean in group VC 
##            22.70            16.77
t.test(len ~ supp, paired = FALSE, var.equal = FALSE, data = lgDose)
## 
##  Welch Two Sample t-test
## 
## data:  len by supp
## t = -0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means between group OJ and group VC is not equal to 0
## 95 percent confidence interval:
##  -3.79807  3.63807
## sample estimates:
## mean in group OJ mean in group VC 
##            26.06            26.14

The p-value for the general test between the two supplements is just short of the Alpha threshold, so we fail to reject this null hypothesis. The p-value of OJ vs VJ at the small and medium dose levels is .006 and .001, so we can reject the null hypothesis.

Conclusions

We reject the null hypothesis that small and medium doses have the same effect as small and medium doses. As the size of the dose increases, the differences in negligible.