Joseph Bloomquist 05-20-2024
This is the second portion of the John Hopkins Statistical Inference Course project.
In this we will:
Load ToothGrowth data and perform some basic exploratory data analysis
Provide a basic summary of the data
Use confidence intervals and/or hypothesis tests to compare tooth growth by supp and dose
State conclusions and the assumptions needed for those conclusions
First, we will load the data. We know this data is already clean and workable, so we will skip the data cleaning process.
data("ToothGrowth")
str(ToothGrowth)
## 'data.frame': 60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
Through the structure we can see 60 observations that factor in either “OJ” (Orange Juice) or “VC” (Ascorbic Acid) with 3 different doses.
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
The summary confirms our 2 factors and shows an even split in the supply methods used.
We will break up the data in to subsets first based on the factors.
ojData <- subset(ToothGrowth, supp == "OJ")
vcData <- subset(ToothGrowth, supp == "VC")
Now break it down further by dosage.
halfDoseOJ <- ojData[ojData$dose == 0.5,]
fullDoseOJ <- ojData[ojData$dose == 1,]
doubleDoseOJ <- ojData[ojData$dose == 2.0,]
halfDoseVC <- vcData[vcData$dose == 0.5,]
fullDoseVC <- vcData[vcData$dose == 1,]
doubleDoseVC <- vcData[vcData$dose == 2.0,]
Let’s see what each group looks like visually. Both groups compared
boxplot(ojData$len, vcData$len, main = "OJ vs. VC", sub = "All Doses", names = c("OJ", "VC"), ylab = "Length")
As we can see on average, OJ appears to be more effective. We can test this using a 2 sample t-test.
t.test(ojData$len, vcData$len)
##
## Welch Two Sample t-test
##
## data: ojData$len and vcData$len
## t = 1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1710156 7.5710156
## sample estimates:
## mean of x mean of y
## 20.66333 16.96333
Since the p-value (0.06063) is greater than the common alpha level of 0.05, we do not reject the null hypothesis. There is not enough evidence to conclude a significant difference. No significant difference at this level
boxplot(halfDoseOJ$len, halfDoseVC$len, names = c("OJ", "VC"), ylab = "Length", main = "OJ vs. VC", sub = "0.5/mg Dose")
At 0.5/mg dosage, it clearly shows “OJ” as the winner. Let’s test that hypothesis.
t.test(halfDoseOJ$len, halfDoseVC$len)
##
## Welch Two Sample t-test
##
## data: halfDoseOJ$len and halfDoseVC$len
## t = 3.1697, df = 14.969, p-value = 0.006359
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.719057 8.780943
## sample estimates:
## mean of x mean of y
## 13.23 7.98
Since the p-value (0.006359) is way smaller than the common alpha level of 0.05, we fully reject the null hypothesis as there is enough evidence to conclude a significant difference. Significant difference in favor of OJ @ 0.5/mg Dose
boxplot(fullDoseOJ$len, fullDoseVC$len, names = c("OJ", "VC"), ylab = "Length", main = "OJ vs. VC", sub = "1.0/mg Dose")
It would appear that “OJ” is the winner again. Let’s test!
t.test(fullDoseOJ$len, fullDoseVC$len)
##
## Welch Two Sample t-test
##
## data: fullDoseOJ$len and fullDoseVC$len
## t = 4.0328, df = 15.358, p-value = 0.001038
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 2.802148 9.057852
## sample estimates:
## mean of x mean of y
## 22.70 16.77
Since the p-value (0.001038) is much smaller than the common alpha level of 0.05, we fully reject the null hypothesis as there is enough evidence to conclude a significant difference. Significant difference in favor of OJ @ 1.0/mg Dose
boxplot(doubleDoseOJ$len, doubleDoseVC$len, names = c("OJ", "VC"), ylab = "Length", main = "OJ vs. VC", sub = "2.0/mg Dose")
This looks pretty even on average. Let’s test again.
t.test(doubleDoseVC$len, doubleDoseOJ$len)
##
## Welch Two Sample t-test
##
## data: doubleDoseVC$len and doubleDoseOJ$len
## t = 0.046136, df = 14.04, p-value = 0.9639
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.63807 3.79807
## sample estimates:
## mean of x mean of y
## 26.14 26.06
The astoundingly high p-value suggests there is no true difference in means, just more effect as it scales. Since the p-value (0.9639) is greater than the common alpha level of 0.05, we do not reject the null hypothesis. There is not enough evidence to conclude a significant difference. No significant difference at this level
Based on this data, we can conclude that “OJ” taken as 1.0/mg Dose is the most effective for tooth growth.