Analysing the ToothGrowth dataset
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
dim(ToothGrowth)
## [1] 60 3
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
After Observing the dataset we get to know that there are three variables in each observations
library(tidyverse)
library(ggplot2)
g <- ggplot(data = ToothGrowth)+geom_point(mapping = aes(x=dose, y=len))
g <- g+ facet_grid(.~supp)
g <- g+ geom_smooth(mapping = aes(x=dose, y=len))
g
g2 <- ggplot(ToothGrowth, aes(dose,len))
g2<-g2+geom_boxplot(mapping = aes(group=dose), col ="black", fill = "red")
g2<-g2+facet_grid(.~supp)
g2
Now we want to compare the two vitamin C delivery methods (OJ and VC) in terms of their effect on the cell length. Assuming tooth growth is a good thing, we want to maximize the effect of delivering a particular dose. The question we want to ask first is: which is the more effective delivery method? Since there are three dosages, we effectively have three datasets to compare.
Let’s start with the lowest dosage: 0.5mg/day. We will set up a hypothesis test under the following conditions:
Since the sample size is relatively small (10 samples per delivery method), we will apply a two-sided t-test. The observations refer to different subjects, so we must use an unpaired test.
low_vc <- filter(ToothGrowth, supp == "VC", dose == 0.5)$len
low_oj <- filter(ToothGrowth, supp == "OJ", dose == 0.5)$len
t.test(low_oj-low_vc,alternative = "two.sided", paired=F,conf.level = 0.95)
##
## One Sample t-test
##
## data: low_oj - low_vc
## t = 2.9791, df = 9, p-value = 0.01547
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 1.263458 9.236542
## sample estimates:
## mean of x
## 5.25
The results show us that the mean difference between the lengths is 5.25 units in favor of the OJ method. The 95% confidence interval ranges from 1.2 to 9.2 units, allowing us to reject the null hypothesis.
For smaller dosages, OJ is better than VC. Now Chicking for middle range doses
mid_oj <- filter(ToothGrowth, supp == "OJ", dose == 1)$len
mid_vc <- filter(ToothGrowth, supp == "VC", dose == 1)$len
t.test(mid_oj-mid_vc,alternative = "two.sided", paired=F,conf.level = 0.95)
##
## One Sample t-test
##
## data: mid_oj - mid_vc
## t = 3.3721, df = 9, p-value = 0.008229
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 1.951911 9.908089
## sample estimates:
## mean of x
## 5.93
The results show us that the mean difference between the lengths is 5.93 units in favor of the OJ method. The 95% confidence interval ranges from 1.95 to 9.9 units, allowing us to reject the null hypothesis.
For middle dosages, OJ is better than VC.
Checking for High range doses
hig_oj = filter(ToothGrowth, supp == 'OJ', dose == 2.)$len
hig_vc = filter(ToothGrowth, supp == 'VC', dose == 2.)$len
t.test(hig_oj - hig_vc, alternative='two.sided', paired=FALSE, conf.level = .95)
##
## One Sample t-test
##
## data: hig_oj - hig_vc
## t = -0.042592, df = 9, p-value = 0.967
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -4.328976 4.168976
## sample estimates:
## mean of x
## -0.08
In this case, the mean difference is close to zero. The CI also includes zero, meaning we fail to reject the null hypothesis. This means there is no clear winner in this particular case.
Another important observation is the variance for high doses
c(var(hig_oj), var(hig_vc))
## [1] 7.049333 23.018222
Shows us the VC method indeed yields results of greater variance
The OJ delivery method was more effective for low and mid dosages. For the high dosage, both methods performed approximately the same. However, from looking at the scatter plot above, it is evident that the OJ plot is plateauing, while the VC plot is continuing to increase. This suggests that if the experiment included dosages higher than 2.0mg/day, the VC delivery method would be more effective.
However, the VC delivery method also yielded results of greater variance at higher dosages. This may be a relevant factor to consider when choosing between the two methods.
This conclusion depends on the assumption that the samples were obtained from independent populations.