Part 2: Basic Inferential Data Analysis Instructions

Now in the second portion of the project, we’re going to analyze the ToothGrowth data in the R datasets package.

Load the ToothGrowth data and perform some basic exploratory data analyses Provide a basic summary of the data. Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose. (Only use the techniques from class, even if there’s other approaches worth considering) State your conclusions and the assumptions needed for your conclusions.

# Download and unzip
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.1
library(RColorBrewer)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.6.1
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
library(lattice)
library(knitr)
library(datasets)


# Checking the summaries of each variable
str(ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
unique(ToothGrowth$dose)
## [1] 0.5 1.0 2.0
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
# We can notice that the dosage only gets 3 unique values, and would be better suited to be a factor variable, as such

ToothGrowth$dose <- as.factor(ToothGrowth$dose)

colnames(ToothGrowth) <- c("Length", "Supplement", "Dosage")

ToothGrowth$Concat <- paste0(ToothGrowth$Supplement, " ", ToothGrowth$Dosage)

g <- ggplot(ToothGrowth) + geom_boxplot(aes(Concat, Length)) + ggtitle("Tooth Growth by Method vs. Dosage") + xlab("Method vs. Dosage") +
  ylab("Tooth Length") + stat_summary(aes(Concat, Length))
g
## No summary function supplied, defaulting to `mean_se()

From the given plot we can conclude that drinking higher dosage of Vitamin C happens a lot more often and is correlated with bigger Tooth Length (preferred method). Otherwise, it seems there’s high Tooth Growth with orange juice (in dosage of 2 mg), but not as many people do this

So, one thing we can hypothesize is that the Vitamin C is helping tooth growth more than OJ, and we can do some T-tests to prove or disprove this hypothesis

theTest1 <- t.test(ToothGrowth$Length[ToothGrowth$Supplement=="OJ"], ToothGrowth$Length[ToothGrowth$Supplement=="VC"], alternative="greater", var.equal=FALSE, conf.level=0.9)

theTest1
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$Length[ToothGrowth$Supplement == "OJ"] and ToothGrowth$Length[ToothGrowth$Supplement == "VC"]
## t = 1.9153, df = 55.309, p-value = 0.03032
## alternative hypothesis: true difference in means is greater than 0
## 90 percent confidence interval:
##  1.194309      Inf
## sample estimates:
## mean of x mean of y 
##  20.66333  16.96333
theTest2 <- t.test(ToothGrowth$Length[ToothGrowth$Dosage=="2"], ToothGrowth$Length[ToothGrowth$Dosage!="2"], alternative="greater", var.equal=FALSE, conf.level=0.9)

theTest2
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$Length[ToothGrowth$Dosage == "2"] and ToothGrowth$Length[ToothGrowth$Dosage != "2"]
## t = 8.3085, df = 56.202, p-value = 1.173e-11
## alternative hypothesis: true difference in means is greater than 0
## 90 percent confidence interval:
##  9.224029      Inf
## sample estimates:
## mean of x mean of y 
##     26.10     15.17
theTest3 <- t.test(ToothGrowth$Length[ToothGrowth$Dosage=="2" & ToothGrowth$Supplement=="OJ"], ToothGrowth$Length[ToothGrowth$Dosage=="2" & ToothGrowth$Supplement=="VC"], alternative="greater", var.equal=FALSE, conf.level=0.9)

theTest3
## 
##  Welch Two Sample t-test
## 
## data:  ToothGrowth$Length[ToothGrowth$Dosage == "2" & ToothGrowth$Supplement ==  and ToothGrowth$Length[ToothGrowth$Dosage == "2" & ToothGrowth$Supplement ==     "OJ"] and     "VC"]
## t = -0.046136, df = 14.04, p-value = 0.5181
## alternative hypothesis: true difference in means is greater than 0
## 90 percent confidence interval:
##  -2.411955       Inf
## sample estimates:
## mean of x mean of y 
##     26.06     26.14

Results

  1. The p-value of 0.03 is smaller than the tolerance of our confidence interval (0.1), this hypothesis was proven to be correct. This hypothesis would have failed at 99% interval of confidence

  2. The p-value of test 2 is very close to zero, so the hypothesis is proven correct that higher dosage of supplements helps in tooth growth

  3. The p-value of test-3 is 0.5, so it’s nowhere near zero or even 0.1 which is the tolerance. There is no correlation between a higher dosage of vitamin C contributing more towards tooth growth than a high dosage of Orange Juice